Optimal tracking agent: a new framework of reinforcement learning for multiagent systems | Zendy

Cao Weihua | Zendy; Chen Gang | Zendy; Chen Xin | Zendy; Wu Min | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Optimal tracking agent: a new framework of reinforcement learning for multiagent systems

Author(s) -

Cao Weihua,

Chen Gang,

Chen Xin,

Wu Min

Publication year - 2012

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.2870

Subject(s) - reinforcement learning , computer science , action selection , curse of dimensionality , bellman equation , convergence (economics) , estimator , action (physics) , artificial intelligence , dimension (graph theory) , mathematical optimization , function (biology) , q learning , process (computing) , mathematics , statistics , physics , quantum mechanics , neuroscience , evolutionary biology , economics , pure mathematics , perception , biology , economic growth , operating system

SUMMARY The curse of dimensionality is a ubiquitous problem for multiagent reinforcement learning, which means the learning and storing space grows exponentially with the number of agents and hinders the application of multiagent reinforcement learning. To relieve this problem, we propose a new framework named as optimal tracking agent (OTA). The OTA views the other agents as part of the environment and uses a reduced form to learn the optimal decision. Although merging other agents into the environment may reduce the dimension of action space, the environment characterized by such form is dynamic and does not satisfy the convergence of reinforcement learning (RL). Thus, we develop an estimator to track the dynamics of the environment. The estimator obtains the dynamic model, and then the model‐based RL can be used to react to the dynamic environment optimally. Because the Q‐function in OTA is also a dynamic process because of other agents’ dynamics, different from traditional RL, in which the learning is a stationary process and the usual action selection mechanisms just suit to such stationary process, we improve the greedy action selection mechanism to adapt to such dynamics. Thus, the OTA will have convergence. An experiment illustrates the validity and efficiency of the OTA.Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore