Dynamic Road Scene Understanding

Author: Mahtab Sandhu
Date: 2020-11-27
Report no: IIIT/TH/2020/114
Advisor:Madhava Krishna

Abstract

In this thesis, we develop a higher understanding of a dynamic road scene by Leveraging the information present in the evolution of spatial relations present between the objects of interest in the road scene. We first propose a novel motion clustering formulation over Spatio-temporal depth images obtained from stereo sequences that segments multiple motion models in the scene in an unsupervised manner. The motion models are obtained at frame rates that compete with the speed of the stereo depth computation. This is possible due to a decoupling framework that first delineates spatial clusters and subsequently assigns motion labels to each of these clusters with analysis of a novel motion graph model. A principled computation of the weights of the motion graph that signifies the relative shear and stretch between possible clusters lends itself to a high fidelity segmentation of the motion models in the scene. The fidelity is vindicated through accuracies reaching 89.61% on KITTI and complex native sequences. In the second part of thesis, we develop a semantically meaningful awareness of the dynamic road scenes for Understanding the on-road behavior of vehicles Humans typically can decompose a dynamic scene as vehicles exhibiting behaviors like ”moving away from me”, ”moving towards me”, ”changing lane” and the like. Such higher-level scene awareness can have implications on the performance of algorithms lower in the hierarchy. For example, a SLAM or state estimation algorithm can benefit from the knowledge that a car is ”parked” as features from a parked car is indeed useful for estimating relative camera motion. In this paper we decompose a road scene into vehicle behaviors such as ”parked”, ”following lane - moving away”, ”following lane - coming towards” and similar such labels. We accomplish this through Multi-Relational Graph Convolutional Networks (MR-GCN) that is proving itself to be an apt architecture to learn behaviors of agents or objects based on their evolving relations with other agents or objects in the scene. We show high fidelity behavior prediction of vehicles encountered in a variety of datasets such as Cityscapes, KITTI, ApolloScape, and Indian. More critically we show the effective model transfer from one dataset to another verifying generality and repeatability across a diverse and rich set of worldwide scenes.

Full thesis: pdf

Centre for Robotics

IIIT Hyderabad Publications

Dynamic Road Scene Understanding

Abstract