When, What, and How Much to Reward in Reinforcement Learning‐Based Models of Cognition | Zendy

Janssen Christian P. | Zendy; Gray Wayne D. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

When, What, and How Much to Reward in Reinforcement Learning‐Based Models of Cognition

Author(s) -

Janssen Christian P.,

Gray Wayne D.

Publication year - 2012

Publication title -

cognitive science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.498

H-Index - 114

eISSN - 1551-6709

pISSN - 0364-0213

DOI - 10.1111/j.1551-6709.2011.01222.x

Subject(s) - reinforcement learning , task (project management) , categorical variable , context (archaeology) , cognition , computer science , cognitive psychology , artificial intelligence , reinforcement , function (biology) , sequence (biology) , moment (physics) , sequence learning , machine learning , action (physics) , psychology , social psychology , paleontology , genetics , physics , management , classical mechanics , neuroscience , evolutionary biology , economics , biology , quantum mechanics

Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the objective function: e.g., performance time or performance accuracy), and how much (the magnitude: with binary, categorical, or continuous values). In this article, we explore the problem space of these three parameters in the context of a task whose completion entails some combination of 36 state–action pairs, where all intermediate states (i.e., after the initial state and prior to the end state) represent progressive but partial completion of the task. Different choices produce profoundly different learning paths and outcomes, with the strongest effect for moment. Unfortunately, there is little discussion in the literature of the effect of such choices. This absence is disappointing, as the choice of when, what, and how much needs to be made by a modeler for every learning model.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore