Premium
Eligibility traces and forgetting factor in recursive least‐squares‐based temporal difference
Author(s) -
Baldi Simone,
Zhang Zichen,
Liu Di
Publication year - 2022
Publication title -
international journal of adaptive control and signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.73
H-Index - 66
eISSN - 1099-1115
pISSN - 0890-6327
DOI - 10.1002/acs.3282
Subject(s) - benchmark (surveying) , recursive least squares filter , forgetting , temporal difference learning , computer science , reinforcement learning , variable (mathematics) , function (biology) , algorithm , control (management) , mathematical optimization , mathematics , artificial intelligence , adaptive filter , mathematical analysis , linguistics , philosophy , geodesy , evolutionary biology , biology , geography
Summary We propose a new reinforcement learning method in the framework of Recursive Least Squares‐Temporal Difference (RLS‐TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS‐TD( λ )), we propose to use the forgetting factor commonly used in gradient‐based or least‐square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS‐TD with forgetting factor (RLS‐TD‐f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart‐pole benchmark and an adaptive cruise control benchmark as experimental platforms.