Scalable Planning and Learning for Decentralized MDPs With Event Driven Rewards

Author: Tarun Gupta
Date: 2018-12-10
Report no: IIIT/TH/2018/87
Advisor:Praveen Paruchuri

Abstract

Decentralized MDPs (Dec-MDPs) provide a rigorous framework for collaborative multi-agent sequential decision making under uncertainty and partial observability. However, their high computational complexity limits their practical impact. To overcome this computational complexity, we focus on a special class of DEC-MDPs consisting of independent agents which collaborate with each other through a global joint-reward function that depend upon their entire histories of states and actions to accomplish joint tasks. We make the following contributions to address the issue of scalability for this class of problems. 1) Dec-NLP: A nonlinear programming (NLP) formulation for such event-based planning model which scales well for small problems. However, we observed empirically that with the increasing number of agents, NLP solvers were unable to scale, slow and often ran out of memory; 2) Dec-EM: A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents. However, the scalability still suffers when the state-space of each agent is exponential, which is often the case for several patrolling and coverage problems; 3) Dec-MARL: A policy gradient based multi-agent Reinforcement Learning (RL) approach that scales well even for exponential state-spaces. (4) Dec-ESR : A new actor-critic based Reinforcement Learning (RL) approach for event based Dec- MDPs that produces better solution quality than Dec-MARL using successor features (SF) which is a value function representation that decouples the dynamics of the environment from the rewards. We then present how Dec-ESR generalizes learning for event based Dec-MDPs using SF within an end-to-end deep RL framework; (5) We show that the proposed method using SF allows useful transfer of information on related but different tasks, hence bootstraps the learning for new tasks and makes convergence much faster on new tasks. Our inference and RL-based advances enable us to solve a large real-world multi-agent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale.

Full thesis: pdf

Centre for Data Engineering

IIIT Hyderabad Publications

Scalable Planning and Learning for Decentralized MDPs With Event Driven Rewards

Abstract