Fine Pose Estimation and Region Proposals from a Single Image

Author: Sudipto Banerjee
Date: 2018-03-24
Report no: IIIT/TH/2018/15
Advisor:Anoop M Namboodiri

Abstract

Understanding the precise 3D structure of an environment is one of the fundamental goals of computer vision and is challenging due to a variety of factors such as appearance variation, illumination, pose, noise, occlusion and scene clutter. A generic solution to the problem is ill-posed due to the loss of depth information during imaging. In this paper, we consider a specific but common situation, where the scene contains known objects. Given 3D models of a set of known objects and a cluttered scene image, we try to detect these objects in the image, and align 3D models to their images to find their exact pose. We develop an approach that poses this as a 3D-to-2D alignment problem. We also deal with pose estimation of 3D articulated objects in images. We evaluate our proposed method on BigBird dataset and our own tabletop dataset, and present experimental comparisons with state-of-the-art methods. In order to find the pose of an object, we come up with a hierarchical approach whereby we first an initial estimate of the pose and thereby refine it using a robust algorithm. Obtaining the initial estimate is crucial as the refinement is entirely dependant on it. Estimating the object proposals or region proposals from an image is a well-known but difficult task, as the complexity of the problem intensifies due to the presence of object-object interaction and background clutter. We tackle the problem by coming up with a robust Convolutional Neural Network based method which learns object proposals in a supervised manner. As we need region proposals at object level, we solve the problem of instance-level semantic segmentation, where each pixel in the image is classified into one of the known classes. Moreover, two pixels are labelled differently if they belong to two different instances of the same class. We show quantitative and qualitative comparison of our proposed network models with previous approaches, and show our results on the challenging PASCAL VOC dataset.

Full thesis: pdf

Centre for Visual Information Technology

IIIT Hyderabad Publications

Fine Pose Estimation and Region Proposals from a Single Image

Abstract