Behavioral Planning for Automated Vehicles using Deep Reinforcement Learning

Author: Meha Kaushik
Date: 2018-10-29
Report no: IIIT/TH/2018/74
Advisor:Madhava Krishna

Abstract

Autonomous Driving is one of the hot areas of research in the current world. Deep Machine Learning and Computer Vision together have shown promising results in this field. A relatively new domain is Reinforcement Learning. RL is associated with behavior based learning. An RL agent learns from experiences, by exploring it action and state space. In this work, a Deep Reinforcement Learning algorithm, Deep Deterministic Policy Gradient(DDPG) is used to learn various driving behaviors. DDPG is an off-policy, continuous control framework. It is the off-policy nature of this algorithm which enables it to efficiently explore the environment. Curriculum learning and Intrinsic Motivation are used to extract advantage of the off-policy nature. Different behavior based agents are trained. The important ones being Overtaking in highways and Opportunistic agent in dense unstructured traffic scenes. For overtaking in highways, the agent learns in a Curriculum based approach. First, it learns to drive on an empty road and next it learns overtaking strategies. The reward function is handcrafted in a manner that the desired behavior is learned in least number of training episodes. Various experiments were conducted to handcraft the reward, a detailed analysis of which have been provided. The learned agent is able to overtake not only on straight but curved roads as well. Another behavior that was learned was blocking. Blocking is an hostile behavior and should not be practiced in real life. By slightly changing the reward function for Overtaking on highways and by positioning the learning agent ahead of the opponent, blocking behavior was learned. The approach for overtaking and blocking is compared with existing RL based approach for the same. For dense traffic scenes, we learn two behaviors: Opportunistic and Defensive. Opportunistic agent actively looks for free space ahead of it and navigates itself there, while defensive stays in its own lane and changes its speed each time any vehicle approaches it, to avoid collisions. There is no prior work which deals with end-to-end driving in dense unstructured traffic. Learning the behavior for defensive agent was not easy to achieve by simply using noise. We used intrinsic motivation based approach by explicitly showing it the rewarding actions. For all of the behaviors the learning is scalable and robust in terms of the speeds of opponent vehicles and the number of cars in the surroundings.

Full thesis: pdf

Centre for Robotics

IIIT Hyderabad Publications

Behavioral Planning for Automated Vehicles using Deep Reinforcement Learning

Abstract