IIIT Hyderabad Publications |
|||||||||
|
Navigating the Multiverse: Enhancing Robotic Assistance through Multi-Object Navigation and Object Location OptimizationAuthor: Ahana Datta 2019111007 Date: 2024-06-01 Report no: IIIT/TH/2024/70 Advisor:Madhava Krishna AbstractEmbodied AI, where artificial agents interact with their environment through sensors and actuators, holds immense potential for real-world applications such as robotic assistance. Efficient object navigation and locating strategies are crucial for robotic assistance in real-world environments. However, existing methodologies often encounter challenges in adapting to dynamic environments and incorporating human-like reasoning for optimal decision-making. This thesis aims to bridge these gaps by addressing two fundamental challenges in embodied AI: Multi-Object Navigation (MultiON) and optimal object location within household environments. First, we tackle MultiON, where a robot is tasked with localizing multiple instances of diverse object classes in dynamic environments. This is a fundamental task for an assistive robot in a home or a factory. Existing methods for this task have viewed this as a direct extension of Object Navigation (ON), the task of localising a single instance of an object class, and are pre-sequenced, i.e., the sequence in which the object classes are to be explored is provided in advance. This is a strong limitation in practical applications characterized by dynamic changes. We present Sequence-Agnostic MultiON (SAM), which is the task of locating an instance each of multiple objects in a household environment in no pre-defined order. We present a deep reinforcement learning framework for an actor-critic architecture and a reward specification. It exploits past experiences and seeks to reward progress towards individual as well as multiple target object classes. We use photo-realistic scenes from Gibson benchmark dataset in the AI Habitat 3D simulation environment to experimentally show that our method performs better than a pre-sequenced approach and a state-of-the-art ON method extended to MultiON. Next, we present CLIPGraphs, a novel method for determining the best room to place or find objects within home environments. Existing approaches predominantly rely on large language models (LLMs) or reinforcement learning (RL) policies, neglecting commonsense domain knowledge. CLIPGraphs effectively integrates domain knowledge, data-driven methods, and multimodal learning to ascertain object-room affinities. Specifically, it (a) encodes a knowledge graph of prior human preferences about the room location of different objects in home environments, (b) incorporates vision-language features to support multimodal queries based on images or text, and (c) uses a graph network to learn object-room affinities based on embeddings of the prior knowledge and the vision-language features. We demonstrate that our approach provides better estimates of the most appropriate location of objects from a benchmark set of object categories in comparison with state-of-the-art baselines. Full thesis: pdf Centre for Robotics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |