Premium
The credit assignment problem in cortico‐basal ganglia‐thalamic networks: A review, a problem and a possible solution
Author(s) -
Rubin Jonathan E.,
Vich Catalina,
Clapp Matthew,
man Kendra,
Verstynen Timothy
Publication year - 2021
Publication title -
european journal of neuroscience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.346
H-Index - 206
eISSN - 1460-9568
pISSN - 0953-816X
DOI - 10.1111/ejn.14745
Subject(s) - action selection , reinforcement learning , basal ganglia , neuroscience , computer science , dopaminergic , action (physics) , neurophysiology , computational model , normative , artificial intelligence , indirect pathway of movement , psychology , machine learning , dopamine , central nervous system , philosophy , physics , epistemology , quantum mechanics , perception
The question of how cortico‐basal ganglia‐thalamic (CBGT) pathways use dopaminergic feedback signals to modify future decisions has challenged computational neuroscientists for decades. Reviewing the literature on computational representations of dopaminergic corticostriatal plasticity, we show how the field is converging on a normative, synaptic‐level learning algorithm that elegantly captures both neurophysiological properties of CBGT circuits and behavioral dynamics during reinforcement learning. Unfortunately, the computational studies that have led to this normative algorithmic model have all relied on simplified circuits that use abstracted action‐selection rules. As a result, the application of this corticostriatal plasticity algorithm to a full model of the CBGT pathways immediately fails because the spatiotemporal distance between integration (corticostriatal circuits), action selection (thalamocortical loops) and learning (nigrostriatal circuits) means that the network does not know which synapses should be reinforced to favor previously rewarding actions. We show how observations from neurophysiology, in particular the sustained activation of selected action representations, can provide a simple means of resolving this credit assignment problem in models of CBGT learning. Using a biologically realistic spiking model of the full CBGT circuit, we demonstrate how this solution can allow a network to learn to select optimal targets and to relearn action‐outcome contingencies when the environment changes. This simple illustration highlights how the normative framework for corticostriatal plasticity can be expanded to capture macroscopic network dynamics during learning and decision‐making.