IIIT Hyderabad Publications |
|||||||||
|
Multi-task Reinforcement Learning for shared action spaces in Robotic SystemsAuthor: Parijat Dewangan Date: 2018-12-18 Report no: IIIT/TH/2018/84 Advisor:Abhishek Sarkar AbstractMotion planning is an important primitive for a robotic or mechanical system with closed kinematic chains that lets the robot find the shortest or otherwise optimal path in order to reach its goal position. Sampling based methods provide an efficient solution for motion planning in basic robot systems. However, complex robotic systems such as humanoids have a large number of degrees- of-freedom (DoF) which makes motion planning extremely dicult. Further, these systems also have constraints such as self-collision avoidance and active balancing. Thus, incorporating these constraints in existing motion planners is complex and computationally expensive. Therefore, real time calculation of inverse kinematics (IK) with dynamically stable configuration is of high necessity in humanoid robots as they are highly susceptible to lose balance. In the first part of this thesis, a methodology to generate joint-space trajectories of stable configurations for motion planning using Deep Reinforcement Learning (RL) is proposed. The methodology is based on Deep Deterministic Policy Gradient (DDPG), where the robot autonomously learns the optimal behavior through a series of trial-and-error interaction with the environment. The proposed strategy was evaluated for various motion planning tasks on the highly articulated upper body of a humanoid model with 27 degree-of-freedom (DoF). Most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where actions between different tasks are not fully separable and share a set of actions. For cases where task-specific actions are fully separable, each task policy can be learnt independently using different policy networks. However when the task-specific actions are not fully separable, the policies cannot be learnt independently. In such environments, a compound policy may be learnt with shared neural network parameters which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from dierent tasks negate each other, making the learning unstable and sometimes less data efficient. The main contribution of this work is a novel framework for simultaneous learning of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients (DPG) and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update in case of partially separable actions to further improve learning. The experimental results show that the framework supports ecient multi-task learning in complex robotic systems, outperforming related methods in continuous action spaces such as DDPG. With the advent of artificial intelligence and machine learning, humanoid robots are made to learn a variety of skills which humans possess. One of fundamental skills which humans use in day-to-day activities is performing tasks with coordination between both the hands. In case of humanoids, learning such skills require optimal motion planning which includes avoiding collisions with the surroundings. In the nal part of this thesis, a methodology based on DiGrad to learn coordinated tasks in cluttered environments is proposed. Further, we propose an algorithm to smooth the joint space trajectories obtained by the proposed framework in order to reduce the noise instilled during training. The proposed framework was tested on a 27 DoF humanoid with articulated torso for performing coordinated object-reaching task with both the hands in four different environments with varying levels of difficulty. Full thesis: pdf Centre for Robotics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |