Monocular Layout Estimation for Autonomous Driving

Author: Kaustubh Mani
Date: 2021-03-06
Report no: IIIT/TH/2021/26
Advisor:Madhava Krishna

Abstract

In this thesis, we address the novel and highly challenging problem of estimating the layout of a complex urban driving scenario. Given a single color image captured from a driving platform, we aim to predict the bird’s eye view layout of the road, lanes, sidewalks and other traffic participants. The estimated layout should reason beyond what is visible in the image, and compensate for the loss of 3D information due to the perspective projection. We dub this problem amodal scene layout estimation, which involves hallucinating scene layout for even parts of the world that are occluded in the image. Firstly, we present MonoLayout, a deep neural network for real-time amodal scene layout estimation from a single image.MonoLayout maps a color image of a scene into a multi-channel occupancy grid in bird’s eye view, where each channel represents occupancy probabilities of various scene components. We represent scene layout as a multi-channel semantic occupancy grid, and leverage adversarial feature learning to “hallucinate” plausible completions for occluded image parts. We extend several state-of- the-art approaches for road-layout estimation and vehicle occupancy estimation in bird’s eye view to the amodal setup and thoroughly evaluate against them. By leveraging temporal sensor fusion to generate training labels, we significantly outperform current art over a number of datasets. Next, we aim to provide a comprehensive benchmark for the task of amodal layout estimation on KITTI and Argoverse datasets and also shift our attention towards estimating more fine-grained atr- ributes such as lanes, crosswalks, vehicles, etc. . To this end, we introduce AutoLay, a new dataset for amodal layout estimation in bird’s eye view. AutoLay includes precise annotations for (amodal) layouts for 44 sequences from the KITTI dataset. In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide detailed semantic annotations for 3D pointclouds. To foster reproducibil- ity and further research in this nascent area, we open-source implementations for several baselines and current art. Further, we propose VideoLayout, a real-time neural net architecture that leverages tempo- ral information from monocular video, to produce more accurate and consistent layouts. VideoLayout achieves state-of-the-art performance on AutoLay benchmark, while running in real-time. We show the potency of the aformentioned methods using several abltaion studies and also show applications of amodal layout estimation on downstream tasks.

Full thesis: pdf

Centre for Robotics

IIIT Hyderabad Publications

Monocular Layout Estimation for Autonomous Driving

Abstract