SHAPE PRIORS FOR MONOCULAR OBJECT LOCALIZATION IN DYNAMIC SCENES

Author: Jatavallabhula Krishna Murthy
Date: 2017-09-07
Report no: IIIT/TH/2017/75
Advisor:Madhava Krishna

Abstract

We tackle the problem of reconstructing moving vehicles in autonomous driving scenarios using only a monocular camera. Though the problem appears to be ill-posed, we demonstrate that prior knowledge about how 3D shapes of vehicles project to an image can be used to reason about the reverse process, i.e., how shapes (back-)project from 2D to 3D. We encode this knowledge in \emph{shape priors}, which are learnt over a small dataset comprising of annotated RGB images of vehicles. Each shape prior comprises of a deformable wireframe model whose vertices are semantically unique \emph{keypoints} of that vehicle. The first contribution is an approach for reconstructing vehicles from just a single (RGB) image. To obtain a 3D wireframe representing the shape, we first localize the vertices of the wireframe (keypoints) in 2D using a Convolutional Neural Network (CNN). We then formulate a shape-aware optimization problem that uses the learnt shape priors to \emph{lift} the detected 2D keypoints to 3D, thereby recovering the 3D pose and shape of a query object from an image. The shape-aware adjustment robustly recovers shape (3D locations of the detected keypoints) while simultaneously filling in occluded keypoints. To tackle estimation errors incurred due to erroneously detected keypoints, we use an Iteratively Re-weighted Least Squares (IRLS) scheme for robust optimization, and as a by-product characterize noise models for each predicted keypoint. We evaluate our approach on autonomous driving benchmarks, and present superior results to existing monocular, as well as stereo approaches. The second contribution is a real-time monocular object localization system that estimates the shape and pose of dynamic objects in real-time, using video frames captured from a moving monocular camera. Here again, by incorporating prior knowledge of the object category, we can obtain more detailed instance-level reconstructions. As opposed to earlier object model specifications, the proposed shape-prior model leads to the formulation of a Bundle Adjustment-like optimization problem for simultaneous shape and pose estimation. We demonstrate how these keypoints can be used to recover 3D object properties, while accounting for any 2D localization errors and self-occlusion. We show significant performance improvements compared to state-of-the-art monocular competitors for 2D keypoint detection, as well as 3D localization and reconstruction of dynamic objects

Full thesis: pdf

Centre for Robotics

IIIT Hyderabad Publications

SHAPE PRIORS FOR MONOCULAR OBJECT LOCALIZATION IN DYNAMIC SCENES

Abstract