IIIT Hyderabad Publications |
|||||||||
|
Mitigating Negative Side EffectsAuthor: Aishwarya Srivastava 20171046 Date: 2023-05-22 Report no: IIIT/TH/2023/44 Advisor:Praveen Paruchuri AbstractAutonomous systems perform various tasks across different industries ranging from finance to healthcare to space applications. However, these systems are often deployed in the open world, where it is hard to obtain complete specifications of the objectives and constraints. Operating based on an incomplete model can produce undesired effects, i.e., Negative Side Effects (NSEs). Negative side effects affect the system’s safety and reliability and can be of two types: Markovian and non-Markovian. In this thesis, we try to mitigate negative side effects in environments modeled as Markov decision processes (MDPs). Unlike previous works in this area that associate negative side effects with stateaction pairs, our framework associates them with entire trajectories, which is more general and captures non-Markovian dependence on states and actions. Non-Markovian negative side effects are produced when the agent executes a certain sequence of actions in the deployed environment. Prior works mitigate Markovian negative side effects and can not be easily extended to non-Markovian negative side effects. We build a framework, Controller-Assisted Safe Planning (CASP), for mitigating the non-Markovian negative side effects. Our primary contributions are: 1. We design a model based on Finite State Controllers (FSCs) that can predict the severity of negative side effects for a given trajectory. 2. We learn the model parameters using observed data containing state-action trajectories and the severity of the associated negative side effects. The model is learned such that it generalizes well to unseen data. Information about negative side effects is gathered through Oracle feedback and compactly represented as a finite state controller. 3. We develop a constrained MDP model that uses information from both the underlying MDP and the learned model for planning while avoiding negative side effects. Our empirical evaluation demonstrates the effectiveness of our approach in learning and mitigating Markovian and non-Markovian negative side effects. Full thesis: pdf Centre for Machine Learning Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |