IIIT Hyderabad Publications |
|||||||||
|
3D Representation Learning Endowed by Optimal TransportAuthor: Siddharth Katageri 2021701018 Date: 2024-03-19 Report no: IIIT/TH/2024/42 Advisor:Charu Sharma AbstractConsider an autonomous agent capable of obeying the following instruction ”Go and clean the coffee spilled on the dining table”. To perfectly execute this instruction, the agent needs to have a precise understanding of its dynamic 3D environment. The final task can be broken down into the following sub-tasks; 3D grounding, 3D semantic understanding, and 3D motion planning. To excel in all these tasks, 3D representation learning plays an important role. Motivated to contribute towards creating systems that can perceive and act in real 3D words, in this thesis, we propose two novel methods for 3D representation learning. The first portion of this thesis focuses on the tasks of unsupervised domain adaptation (UDA) for 3D point clouds. The point cloud data acquisition procedures manifest themselves as significant domain discrepancies and geometric variations among both similar and dissimilar classes. The standard domain adaptation methods developed for images do not directly translate to point cloud data because of their complex geometric nature. Existing works mainly focused on designing a self-supervised task to improve adaptation performance. We propose a new UDA architecture for point cloud classification that benefits from multimodal contrastive learning to get better class separation in both domains individually. Further, the use of optimal transport aims at learning source and target data distributions jointly to reduce the cross-domain shift and provide a better alignment. We conduct a comprehensive empirical study on PointDA 10 and GraspNetPC-10 and show that our method achieves state-of-the-art performance on GraspNetPC-10 (with ≈ 4-12% margin) and best average performance on PointDA-10. Our ablation studies and decision boundary analysis also validate the significance of our contrastive learning module and OT alignment. In the second portion of the thesis, we explore learning Wasserstein embeddings for point clouds towards multiple downstream tasks. As learning embeddings of any data largely depends on the ability of the target space, we propose to embed point clouds as discrete probability distributions in Wasserstein space. We build a contrastive learning setup to learn Wasserstein embeddings that can be used as a pre-training method with or without supervision towards any downstream task. We show that the features captured by Wasserstein embeddings are better in preserving the point cloud geometry, including both global and local information, thus resulting in improved quality embeddings. We perform exhaustive experiments and demonstrate the effectiveness of our method for point cloud classification, transfer learning, segmentation, and interpolation tasks over multiple datasets, including synthetic and real-world objects. We also compare against recent methods that use Wasserstein space and show that our method outperforms them in all downstream tasks. Additionally, our study reveals a promising interpretation of capturing critical points of point clouds that makes our proposed method self-explainable. We hope this work motivates future research in utilizing optimal transport for understanding the real 3D world. We also hope that the self-supervised approaches proposed in this thesis will act as a step towards in this direction. Full thesis: pdf Centre for Visual Information Technology |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |