IIIT Hyderabad Publications |
|||||||||
|
A Dual Process Reinforcement Learning Account for Sequential Decision Making and Skill LearningAuthor: Tejas Savalia Date: 2018-07-10 Report no: IIIT/TH/2018/45 Advisor:Bapi Raju Surampudi AbstractIt has been extensively argued that there are two distinct processes governing behaviour: Goal Directed and Habitual. There have been multiple attempts at establishing a mechanism that governs which behaviour should be in charge of making sequential decisions leading to a distant reward. This thesis is organized in three parts. First we explore a simple task of action selection in a rather complex bio-physical model of Basal Ganglia. We implement a probabilistic inference task and show that the model learns to establish a representation of reward contingencies. We then implement a model of Parkinson’s disease for patients with and without medication as an exploration of how the disease and its drug might act in Basal Ganglia. Next, we move on to complex sequential tasks and present a framework that unifies three distinct dichotomies: Goal Directed versus Habitual behaviour, Explicit versus Implicit Learning and Model-Based versus Model-Free Reinforcement Learning. This framework suggests a hierarchical organization of each mechanism of the dichotomies in two forms: first, we suggest that the Goal-Directed controller plays a dominant role in behaviour which is then taken over by habitual behaviour across multiple trials as the chunk size increases. Second, we suggest that the most granular actions are executed habitually whereas more abstract level actions are goal directed. We suggest that a goal directed behaviour occurs with the engagement of attention as opposed to habitual behaviour which is more automatic, thereby linking the dichotomy to Explicit versus Implicit learning. We then present the idea that behaviour or execution can be organized in opposite directions to each other with attention playing a role of theswitch. The final part is a computational implementation of the unified theory presented previously. We show that a hierarchical organization of Goal-Directed and Habitual Behaviour implemented using Model-Based and Model - Free Reinforcement Learning is successfully able to fare well against the respective pure forms and that such an organization is suitable for explaining motor skill acquisition. We present a possible functional network of interacting brain areas in the frontal cortex with the final values of states and actions represented in the striatum – following which the Basal Ganglia model explored earlier could be used to perform action selection. We implement the Parkinson’s model in simulations of a grid world and suggest a qualitative parallel with the observed literature. Full thesis: pdf Centre for Cognitive Science |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |