IIIT Hyderabad Publications |
|||||||||
|
Exploring Reinforcement Learning Models For Various Aspects of Decision MakingAuthor: Gautham Venugopal 2018112003 Date: 2024-05-03 Report no: IIIT/TH/2024/56 Advisor:Bapi Raju Surampudi AbstractWhile designing computer applications to solve real world problems, engineers and Machine Learning researchers have the freedom to design models specific to the situation at hand. If the application is required to have high latency or low memory usage, the model is typically tweaked at the design stage itself to incorporate these properties, and such models cannot be used in cases where reliability or explainability takes precedence, even if the underlying task is the same. However, in the case of cognitive agents, the strategies and algorithms used to make decisions need to adapt to the situation in an online manner. A popular idea in cognitive science used to explain how animals make important tradeoffs is that of viewing Decision Making as an Evidence Accumulation process. By modelling Decision Making as sampling evidence until a threshold, it is possible for such models to showcase different behaviours as per the needs of the situation. In this thesis, we take the sequential sampling approach and combine it with Reinforcement Learning frameworks in an attempt to move towards a more comprehensive model of Decision Making. In the first portion of the thesis, we explore how Linear Ballistic Accumulators can be incorporated as an action selection mechanism into Q-Learning models, which we refer to as RLLBAs. One important advantage this brings about is the ability of RLLBAs to utilise reaction time data in addition to choice data. We compare the performance of RLLBAs and conventional models in a non-trivial Grid Navigation Task with three action choices. It was found that RLLBAs were able to predict the actions taken by the subjects as well as conventional RL models while at the same time providing good predictions of the reaction time data. In addition, it was also shown that RLLBAs show significant differences in the goodness-of-fit between various forms of arbitration between Model-Free and Model-Based RL, something which is typically harder to achieve with choice data alone. In the second portion of the thesis, we explore how evidence accumulation can be realized in RL neural networks. For this purpose, we take inspiration from existing literature on anytime neural networks and structured reservoirs. The central idea here is to structure the connections of the reservoir so that activity in the network propagates forward across time. As the activity propagates forward it undergoes more processing and becomes less noisy, meanwhile, as the output layer has access to earlier parts of the reservoir, the model can still respond quickly to sudden changes in the environment if relevant. On further experimenting with connectivity patterns found in the Basal Ganglia such as parallel pathways, we find that representing different inputs in different pathways based on the concept of stripes seen in working memory models offer superior accuracy in multi armed bandit tasks over conventional reservoirs. Full thesis: pdf Centre for Cognitive Science |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |