IIIT Hyderabad Publications |
|||||||||
|
Canonicalization of Neural FieldsAuthor: Rohith Agaram 2021702026 Date: 2024-01-20 Report no: IIIT/TH/2024/5 Advisor:Madhava Krishna AbstractAmong the many 3D representations, Coordinate-based implicit neural networks or neural fields gained much appreciation in recent times for their ability to represent shape and appearance with very high fidelity and accuracy in 3D computer vision. Despite the advances, however, it remained challenging to build generalizable neural fields for the category of the objects without datasets like shapenet that provide “canonicalized” object instances that are consistently aligned for their 3D position and orientation (pose). Aligning the objects in 3D helps in many tasks for better generalization on 3d scene understanding, classification, and segmentation. 3D pose estimation can also be obtained by aligning the objects in the 3D. There are methods that align 3d objects represented as point clouds/meshes. Now that we have a new promising 3d implicit representation, there is a need to develop a method that helps to align the neural-fields so that we can enjoy the same benefits we had in the space of point clouds/meshes. Unlike point clouds/meshes neural-fields are parametrized by deep neural networks which is very hard to interpret. In this thesis, we present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). Neural-fields, specifically NeRfs describe the 3D scene as a function of density and viewdependent color. Aligning the objects of a category depends on the geometry rather than the color. That’s why CaFi-Net uses density alone to align the objects within the category. Canonicalization is tightly coupled with equivariant networks. In this work, we draw inspiration from 3D Equivariant networks and construct a CaFi-Net as an Equivariant network for rotations. This network directly learns from continuous and noisy density fields by employing a Siamese network architecture. Previous work has done this for points, but handling fields, specifically vector fields, require us to consider rotation equivariance in both the position and orientation of the field. So, to incorporate the rotation equivariance in the fields, we chose the gradient of a scalar field density, which is a vector field, as the signal for building the rotation equivariance in the CaFi-Net. We used spherical harmonics as a basic building block for the equivariant convolution kernels for CaFi-Net. To handle the noisy signal, we weighted the features with the density value at the point. We employed density-based clustering for the segregation of the background and foreground parts, which is utilized in the calculation of the losses. As there is no publicly available dataset, in order to train the CaFi-Net, we created a simulator that renders 54 camera omnidirectional views for 1300 Nerf instances across 13 shapenet object categories. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose and estimates a canonical field with consistent 3D pose across the entire category. As there are no metrics available for canonicalization for neural fields, we used the same metrics used for the point clouds to evaluate the CaFi-Net Performance. Along with the above metrics we have introduced a new metric Ground Truth Equivariance Consistency(GEC) which measures the canonical performance of CaFi-Net to manual labels. Extensive experiments on the above dataset of 1300 NeRF models show that our method matches or exceeds the performance of 3D point cloud-based methods. We conducted ablation studies, which included exploring the choice of the signal, weighing the equivariant features with the density value, assessing the need for the Siamese network, and finally justifying the design choice of the CaFi-Net. In the results section we showed renderings of the Neural-Fields of the object from the canonical pose that are consistent across the category. Full thesis: pdf Centre for Robotics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |