IIIT Hyderabad Publications |
|||||||||
|
3D Shape Analysis: Reconstruction and ClassificationAuthor: Sai Sagar Jinka Date: 2023-05-24 Report no: IIIT/TH/2023/58 Advisor:Avinash Sharma AbstractThe reconstruction and analysis of 3D objects by computational systems has been an intensive and long-lasting research problem in the graphics and computer vision scientific communities. Traditional acquisition systems are largely restricted to studio environment setup which requires multiple synchronized and calibrated cameras. With the advent of active depth sensors like time-of-flight sensors, structured lighting sensors made 3D acquisition feasible. This advancement of technology has paved way to many research problems like 3D object localization, recognition, classification, reconstruction which demand innovating sophisticated/elegant solutions to match their ever growing applications. 3D human body reconstruction, in particular, has wider applications like virtual mirror, gait analysis, etc. Lately, with the advent of deep learning, 3D reconstruction from monocular images garnered significant interest among the research community as it can be applied to in-the-wild settings. Initially we started exploration of classification of 3D rigid objects due to availabilty of ShapeNet datasets. In this thesis, we propose an efficient characterization of 3D rigid objects which take local geometry features into consideration while constructing global features in the deep learning setup. We introduce learnable B-Spline surfaces in order to sense complex geometrical structures (large curvature variations). The locations of these surfaces are initialized over the voxel space and are learned during training phase leading to efficient classification performance. Later on, we primarily focus on rather challenging problem of non-rigid 3D human body reconstruction from monocular images. In this context, this thesis presents three principle approaches to address 3D reconstruction problem. Firstly, we propose a disentangled solution where we recover shape and texture of the 3D shape predicted using two different networks. We recover the volumetric shape of non-rigid human body shapes given a single view RGB image followed by orthographic texture view synthesis using the respective depth projection of the reconstructed (volumetric) shape and input RGB image. Secondly, we propose PeeledHuman - a novel shape representation of the human body that is robust to self-occlusions. PeeledHuman encodes the human body as a set of Peeled Depth and RGB maps in 2D, obtained by performing ray-tracing on the 3D body model and extending each ray beyond its first intersection. We learn these Peeled maps in an end-to-end generative adversarial fashion using our novel framework - PeelGAN. The PeelGAN enables us to predict shape and color of the 3D human in an end-to-end fashion at significantly low inference rates. Finally, we further improve PeelGAN by introducing a shape prior while reconstructing from monocular images. We propose a sparse and efficient fusion strategy to combine parametric body prior with a non-parametric PeeledHuman representation. The parametric body prior enforces geometrical consistency on the body shape and pose, while the non-parametric representation models loose clothing and handles self-occlusions as well. We also leverage the sparseness of the non-parametric representation for faster training of our network while using losses on 2D maps. We evaluate our proposed methods extensively on a number of datasets. In this thesis, we also introduce 3DHumans dataset, which is a 3D life-like dataset of human body scans with rich geometrical and textural details. We cover a wide variety of clothing styles ranging from loose robed clothing like saree to relatively tight-fitting shirt and trousers. The dataset consists of around 150 male and 50 unique female subjects. Total male scans are about 180 and female scans are around 70. In terms of regional diversity, for the first time, we capture body shape, appearance and clothing styles for the South-Asian population. This dataset will be released for research purposes. Full thesis: pdf Centre for Visual Information Technology |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |