Premium
Faster Teaching via POMDP Planning
Author(s) -
Rafferty An.,
Brunskill Emma,
Griffiths Thomas L.,
Shafto Patrick
Publication year - 2016
Publication title -
cognitive science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.498
H-Index - 114
eISSN - 1551-6709
pISSN - 0364-0213
DOI - 10.1111/cogs.12290
Subject(s) - partially observable markov decision process , computer science , plan (archaeology) , process (computing) , action (physics) , frame (networking) , selection (genetic algorithm) , action selection , artificial intelligence , markov decision process , tracking (education) , mathematics education , machine learning , markov process , agency (philosophy) , markov chain , markov model , psychology , pedagogy , mathematics , philosophy , history , archaeology , operating system , telecommunications , epistemology , quantum mechanics , statistics , physics
Human and automated tutors attempt to choose pedagogical activities that will maximize student learning, informed by their estimates of the student's current knowledge. There has been substantial research on tracking and modeling student learning, but significantly less attention on how to plan teaching actions and how the assumed student model impacts the resulting plans. We frame the problem of optimally selecting teaching actions using a decision‐theoretic approach and show how to formulate teaching as a partially observable Markov decision process planning problem. This framework makes it possible to explore how different assumptions about student learning and behavior should affect the selection of teaching actions. We consider how to apply this framework to concept learning problems, and we present approximate methods for finding optimal teaching actions, given the large state and action spaces that arise in teaching. Through simulations and behavioral experiments, we explore the consequences of choosing teacher actions under different assumed student models. In two concept‐learning tasks, we show that this technique can accelerate learning relative to baseline performance.