IIIT Hyderabad Publications |
|||||||||
|
Streamlining Warehouse Operations: Monocular Multi-View Layout Estimation and Intelligent Visual Servoing for Robotic TasksAuthor: Pranjali Pramod Pathre 2019112002 Date: 2024-06-21 Report no: IIIT/TH/2024/87 Advisor:Madhava Krishna AbstractThis thesis is driven by the imperative to enhance efficiency and precision in warehouse management systems. The first part introduces MVRackLay, an innovative solution employing multi-view analysis to accurately estimate complex layouts of racks and shelves in warehouses, offering a comprehensive 3D rendering of the scene from a single monocular camera. In the second part, Imagine2Servo revolutionizes visual servoing algorithms by generating intermediate goal images through diffusion-based editing techniques, enabling precise control in object-reaching tasks in warehouses and tasks like longrange navigation and manipulation. Through real-world validation, these innovations mark significant advancements in warehouse automation and robotic control systems, promising transformative impacts across various domains. In the first part of this thesis (chapter 3), we showcase MVRackLay, a monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts across multiple views, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, our model outputs segmented racks and each shelf’s front and top view layout within a rack. Further, MVRackLay shows superior performance vis-a-vis its single view counterparts in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase multi-view stitching of the 3D layouts, resulting in a representation of the warehouse scene concerning a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera. In the second part of this thesis (chapter 4), we introduce Imagine2Servo, an innovative approach leveraging diffusion-based image editing techniques to enhance visual servoing algorithms by generating intermediate goal images. Visual servoing, the method of controlling robot motion through feedback from visual sensors, has seen significant advancements with the integration of optical flow-based methods. However, its application remains limited by inherent challenges, such as the necessity for a target image at test time, the requirement of substantial overlap between initial and target images, and the reliance on feedback from a single camera. Imagine2Servo allows for the extension of visual servoing applications beyond traditional constraints, enabling tasks like long-range navigation and manipulation without pre-defined goal images. We show its applicability in precisely performing different warehouse tasks, navigation tasks and manipulation tasks. We propose a pipeline that synthesizes subgoal images grounded in the task at hand, facilitating servoing in scenarios with minimal initial and target image overlap and integrating multi-camera feedback for comprehensive task execution. Our contributions demonstrate a novel application of image generation to robotic control, significantly broadening the capabilities of visual servoing systems. Real-world experiments validate the effectiveness and versatility of the Imagine2Servo framework in accomplishing a variety of tasks, marking a notable advancement in the field of visual servoing. Full thesis: pdf Centre for Robotics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |