IIIT Hyderabad Publications |
|||||||||
|
Understanding Learning in Multi-Agent Reinforcement LearningAuthor: Kinal Mehta Date: 2023-10-20 Report no: IIIT/TH/2023/149 Advisor:Pawan Kumar AbstractReinforcement Learning (RL) has witnessed remarkable advancements in both algorithms and engineering, enabling a wide range of exciting applications. Multi-agent Reinforcement Learning (MARL) in particular has made strides of progress enabling multiple learning entities to interact in effective manner. Few of the challenges that still remain are learning under sparse rewards, achieving social generalization by adapting to changing behaviours of other agents and reproducibility. This thesis tackles two important challenges in MARL: (1) learning in multi-agent sparse reward environments and (2) reproducibility with social generalization. In the first part, we address the issue of learning a reliable critic in multi-agent sparse reward scenarios. The exponential growth of the joint action space with the number of agents, coupled with reward sparsity and environmental noise, poses significant hurdles for accurate learning. To mitigate these challenges, we propose regularizing the critic with spectral normalization (SN). Our experiments demonstrate that the regularized critic exhibits improved robustness, enabling faster learning even in complex multi-agent scenarios. These findings highlight the importance of critic regularization for stable learning. In the second part, we introduce marl-jax, a powerful software package for MARL that focuses on training and evaluating social generalization of agents. Built on DeepMindās JAX ecosystem and leveraging their RL framework, marl-jax supports cooperative and competitive environments with multiple agents acting simultaneously. It provides an intuitive command-line interface for training agent populations and evaluating their generalization capabilities. Researchers interested in exploring social generalization in MARL can leverage marl-jax as a reliable baseline. In conclusion, this thesis addresses two crucial challenges in RL: learning in multi-agent sparse reward scenarios and reproducibility for social generalization in MARL. By introducing spectral normalization as a regularization technique and providing the marl-jax software package, this research contributes to enhancing stability, robustness, social generalization and reproducibility in RL. Full thesis: pdf Centre for Security, Theory and Algorithms |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |