Premium
Anticipatory reward signals in ventral striatal neurons of behaving rats
Author(s) -
Khamassi Mehdi,
Mulder Antonius B.,
Tabuchi Eiichi,
Douchamps Vincent,
Wiener Sidney I.
Publication year - 2008
Publication title -
european journal of neuroscience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.346
H-Index - 206
eISSN - 1460-9568
pISSN - 0953-816X
DOI - 10.1111/j.1460-9568.2008.06480.x
Subject(s) - ventral striatum , striatum , anticipation (artificial intelligence) , neuroscience , dopaminergic , psychology , temporal difference learning , task (project management) , mean squared prediction error , computer science , artificial intelligence , reinforcement learning , machine learning , dopamine , management , economics
It has been proposed that the striatum plays a crucial role in learning to select appropriate actions, optimizing rewards according to the principles of ‘Actor–Critic’ models of trial‐and‐error learning. The ventral striatum (VS), as Critic, would employ a temporal difference (TD) learning algorithm to predict rewards and drive dopaminergic neurons. This study examined this model’s adequacy for VS responses to multiple rewards in rats. The respective arms of a plus‐maze provided rewards of varying magnitudes; multiple rewards were provided at 1‐s intervals while the rat stood still. Neurons discharged phasically prior to each reward, during both initial approach and immobile waiting, demonstrating that this signal is predictive and not simply motor‐related. In different neurons, responses could be greater for early, middle or late droplets in the sequence. Strikingly, this activity often reappeared after the final reward, as if in anticipation of yet another. In contrast, previous TD learning models show decremental reward‐prediction profiles during reward consumption due to a temporal‐order signal introduced to reproduce accurate timing in dopaminergic reward‐prediction error signals. To resolve this inconsistency in a biologically plausible manner, we adapted the TD learning model such that input information is nonhomogeneously distributed among different neurons. By suppressing reward temporal‐order signals and varying richness of spatial and visual input information, the model reproduced the experimental data. This validates the feasibility of a TD‐learning architecture where different groups of neurons participate in solving the task based on varied input information.