z-logo
open-access-imgOpen Access
AdvB-TD3: A Novel Decision-Making Framework for Complex Continuous Control Tasks
Author(s) -
O. Osman,
T. Karaca,
B. Yalcin Kavus,
G. Tulum
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3621002
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Reinforcement learning has emerged as a transformative approach for solving complex decision-making problems, particularly in continuous action domains. In this study, we introduce the Advisory Board Twin Delayed Deep Deterministic Policy Gradient (AdvB-TD3) framework, which integrates the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm with a cooperative advisory board structure to enhance decision-making performance in dynamic and stochastic environments. The AdvB-TD3 framework employs a shared critic network and dynamic member management, enabling robust, adaptive, and scalable decision-making processes. The proposed methodology was rigorously evaluated in MuJoCo environments. Experimental results demonstrated that AdvB-TD3 consistently outperformed TD3 across diverse tasks, achieving faster convergence, higher cumulative rewards, and reduced performance variability. Specifically, AdvB-TD3 achieved performance scores of 309.3 ± 0.95 in BipedalWalker-v3, 4401.5 ± 88.8 in HalfCheetah-v4, 5303.5 ± 26.0 in Humanoid-v4, 1000 ± 0.0 in InvertedPendulum-v4, 9347.10 ± 1.0 in InvertedDoublePendulum-v4, 95.5 ± 0.2 in MountainCarContinuous-v0, -130.1 ± 81.5 in Pendulum-v1, and 317.0 ± 4.2 in Swimmer-v4. When compared to other state-of-the-art algorithms, such as Q-Prop, ACER, and Off-Policy TRPO, AdvB-TD3 exhibited superior stability and learning efficiency, particularly in complex environments like Bipedal Walker and Humanoid. Moreover, the framework’s ability to maintain low variability across trials highlights its reliability in high-dimensional tasks. Overall, AdvB-TD3 establishes a new benchmark for RL methods, outperforming conventional and advanced approaches in continuous control domains. Its innovative architecture and demonstrated superiority across a wide range of scenarios position it as a robust, efficient, and scalable solution for high-dimensional decision-making problems, with significant implications for advancing AI-driven systems.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom