Multirobot Collaborative Pursuit Target Robot by Improved MADDPG | Zendy

Xiao Zhou | Zendy; Song Zhou | Zendy; Xingang Mou | Zendy; Yi He | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multirobot Collaborative Pursuit Target Robot by Improved MADDPG

Author(s) -

Xiao Zhou,

Song Zhou,

Xingang Mou,

Yi He

Publication year - 2022

Publication title -

computational intelligence and neuroscience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.605

H-Index - 52

eISSN - 1687-5273

pISSN - 1687-5265

DOI - 10.1155/2022/4757394

Subject(s) - pursuer , computer science , task (project management) , robot , pursuit evasion , artificial intelligence , curiosity , constraint (computer aided design) , reinforcement learning , similarity (geometry) , mathematical optimization , mathematics , engineering , psychology , social psychology , geometry , systems engineering , image (mathematics)

Policy formulation is one of the main problems in multirobot systems, especially in multirobot pursuit-evasion scenarios, where both sparse rewards and random environment changes bring great difficulties to find better strategy. Existing multirobot decision-making methods mostly use environmental rewards to promote robots to complete the target task that cannot achieve good results. This paper proposes a multirobot pursuit method based on improved multiagent deep deterministic policy gradient (MADDPG), which solves the problem of sparse rewards in multirobot pursuit-evasion scenarios by combining the intrinsic reward and the external environment. The state similarity module based on the threshold constraint is as a part of the intrinsic reward signal output by the intrinsic curiosity module, which is used to balance overexploration and insufficient exploration, so that the agent can use the intrinsic reward more effectively to learn better strategies. The simulation experiment results show that the proposed method can improve the reward value of robots and the success rate of the pursuit task significantly. The intuitive change is obviously reflected in the real-time distance between the pursuer and the escapee, the pursuer using the improved algorithm for training can get closer to the escapee more quickly, and the average following distance also decreases.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research