
A Study of First-Passage Time Minimization via Q-Learning in Heated Gridworlds
Author(s) -
Maria A. Larchenko,
Pavel Osinenko,
Grigory Yaremenko,
Vladimir V. Palyulin
Publication year - 2021
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2021.3129709
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. Namely, the state-dependency of noise triggers convergence to suboptimal solutions and the respective policies follow them for practically long learning times. The high learning rate prevents exploration of regions with higher temperature, while the low enough rate increases the presence of agents in such regions. These biases of temporal-difference-based reinforcement learning methods may have implications for their application in real-world physical scenarios and agent design.