IIIT Hyderabad Publications |
|||||||||
|
Learning Multi-Goal Reachability in a Humanoid Robot using Deep Reinforcement LearningAuthor: Phaniteja S Date: 2018-08-09 Report no: IIIT/TH/2018/64 Advisor:Abhishek Sarkar AbstractInverse Kinematics (IK) problem deals with finding an appropriate actuator configuration that makes the end effector Cartesian coordinates and pose match the given coordinates and pose. In most of the cases, it is enough if we can solve IK for the given coordinates whatever may be the pose. Real time calculation of inverse kinematics with dynamically stable configuration is of high necessity in humanoid robots as they are highly susceptible to lose balance. General Inverse Kinematic solvers may not guarantee real-time control of the end-effectors in external coordinates along with maintaining stability. This work proposes a methodology to generate joint-space trajectories of stable configurations for solving inverse kinematics using Deep Reinforcement Learning (RL). Our approach is based on the idea of exploring the entire configuration space of the robot and learning the best possible solutions using an actor-critic based policy learning in continuous action spaces, Deep Deterministic Policy Gradient (DDPG). The proposed strategy was evaluated on the highly articulated upper body of a humanoid model with 27 degree of freedom (DoF). The trained model was able to solve inverse kinematics for the end effectors with 90% accuracy while maintaining the balance in double support phase. Following the success of learning a general IK solver for goal reachability tasks of a single hand of humanoid, a more challenging problem of solving IK for both hands of the humanoid simultaneously is taken up. For addressing this problem, the proposed methodology was extended with significant changes in the state vector and reward function modelling. The extended strategy was then evaluated on the highly articulated upper body of the given humanoid for learning multi-goal reachability tasks of both hands along with maintaining stability in double support phase. Results show that the trained model was able to solve inverse kinematics for both the hands, where the articulated torso contributed to both the tasks. However it was observed that DDPG was unstable and in some cases even though it was stable initially, the cumulative reward started degrading later. Consequently, this works moves on to address these instability issues by using multi-task reinforcement learning. However, most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where actions between different tasks are not fully separable and share a set of actions. In case of tasks where actions are fully separable, each task policy can be learnt independently using different policy networks. However when the actions are not fully separable, the policies cannot be learnt independently. In such environments a compound policy may be learnt with shared neural network parameters, which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from different tasks negate each other, making the learning unstable and sometimes less data efficient. In this work, we propose a new approach for simultaneous learning of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update in case of partially separable actions to further improve learning. In order to show the efficiency of framework, the proposed architecture was tested on the given humanoid for learning multi-goal reachability tasks (each end effector has a different goal) and the results were compared with that of DDPG. Training results show that DiGrad converges faster than DDPG and removes the instability in training. Finally a comparative analysis with different settings in DiGrad along with DDPG was shown. Full thesis: pdf Centre for Robotics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |