Deep Learning based Scene Agnostic Image Based Visual Servoing

Author: Y V S Harish
Date: 2021-05-21
Report no: IIIT/TH/2021/52
Advisor:Madhava Krishna

Abstract

In the era of autonomous navigation, visual servoing plays a pivotal role in tasks that involve vision, control, machine learning and embedded systems. Deep learning has evolved hand-in-hand with the advancements in various technical fields and has become a part and parcel in the current technological inventions. This thesis aims at improving the current state of research in Image Based visual servoing using deep learning architectures. The proposed pipelines make an attempt to carry out visual servoing in unforeseen environments thus making them scene agnostic. We also focus on developing an unsupervised control mechanism that forecasts into a fixed time-horizon and helps in achieving faster target driven visual navigation. We mainly put forward 2 architectures. In the first architecture(chapter 3), we propose a two-fold solution: (i) We consider optical flow as our visual features, which are predicted using a deep neural network. (ii) These flow features are then systematically integrated with depth estimates provided by another neural network using interaction matrix. We further present an extensive benchmark in a photo-realistic 3D simulation across diverse scenes to study the convergence and generalisation of visual servoing approaches. We show convergence for over 3m and 40 degrees while maintaining precise positioning of under 2cm and 1 degree on our challenging benchmark where the existing approaches that are unable to converge for majority of scenarios for over 1.5m and 20 degrees. Furthermore, we also evaluate our approach for a real scenario on an aerial robot. Our approach generalizes to novel scenarios producing precise and robust servoing performance for 6 degrees of freedom positioning tasks with even large camera transformations without any retraining or fine-tuning. In the second architecture(chapter 4), we present a Deep model predictive visual servoing framework (Deep MPC) that can achieve precise alignment with optimal trajectories and can generalize to novel environments. Our framework consists of a deep network for optical flow predictions, which are used along with a predictive model to forecast future optical flow. For generating an optimal set of velocities we present a control network that can be trained on-the-fly without any supervision. This work bridges the gap between classical and learning-based control for 6-DoF image-based visual servoing (IBVS) in novel environments with continuous action and state space. We select optical flow to represent our states instead of directly working with images. The dense optical flow is predicted using deep neural networks. Subsequently, we reformulate a predictive model for forecasting the evolution of states given a sequence of actions (velocities). We then learn a recurrent control network on-the-fly in an unsupervised fashion for computing an optimal set of velocity for a given goal state. We show superior performance vis-a-vis Deep Visual Servoing methods due to a receding horizon controller even as the framework generalizes to new environments without needing to retrain or finetune. The controller regresses to a continuous space of outputs over 6-DoF. Again, through extensive simulations on photo-realistic indoor settings of the popular Habitat framework, we show significant performance gain due to the proposed formulation vis-a-vis recent state of the art methods. Specifically, we show a faster convergence and an improved performance in trajectory length over recent approaches.

Full thesis: pdf

Centre for Robotics

IIIT Hyderabad Publications

Deep Learning based Scene Agnostic Image Based Visual Servoing

Abstract