IIIT Hyderabad Publications |
|||||||||
|
A Neuro-computational Model of Reward Prediction Error in Classical ConditioningAuthor: Pramod Sivaram Kaushik Date: 2017-12-30 Report no: IIIT/TH/2017/95 Advisor:Bapi Raju Surampudi AbstractDopamine plays a major role in learning and reward. Dopaminergic neurons in the ventral tegmental area (VTA) exhibit the discrepancy between expected and actual rewards known as reward prediction error (RPE). But how these signals are computed in the brain remains unknown. One of the earliest attempts of investigation of animal learning involved pairing an unconditioned stimulus (US) with a cue or conditioned stimulus (CS) and observing that animal starts responding to the CS after some point in time. This is the basis of Pavlovian learning or classical conditioning, a fundamental learning mechanism in animals. The model described here focuses on the mechanism of reward prediction error within Pavlovian learning and does not deal with other conditioning phenomena. To acquire a deeper understanding of the roles of the cerebral structures involved in RPE computation, we need a neuro-computational model implicating them more faithfully in Pavlovian learning. The approach described here is a model that proposes a circuit involving a dissociation between magnitude expectation and timing expectation and explaining more precisely how the computation of the reward prediction error happens inside the VTA. The model described here proposes that the two dimensions involved in computation of reward prediction errors i.e., magnitude and time could be computed separately and later combined unlike the way it is done in traditional reinforcement learning models. The model is built on biological evidences and is able to reproduce various aspects of classical conditioning, namely, the progressive cancellation of the predicted reward, the predictive firing from conditioned stimuli, the twin peaks of dopamine firing and delineation of early rewards by showing more firing for sooner early rewards and less for early rewards that occur with a longer latency in accordance with biological data. Key predictions for experiments are made regarding how neural substrates execute this computation which would validate the model. Potential implications for the reinforcement learning theory are also discussed. Full thesis: pdf Centre for Cognitive Science |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |