Reinforcement Learning-Based Footstep Control for Humanoid Robots on Complex Terrain
Author(s) -
William Suliman,
Egor Davydenko,
Ekaterina Chaikovskaia,
Roman Gorbachev
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3622091
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
We propose a Reinforcement Learning (RL)-based footstep planning framework that enables a bipedal robot to achieve stable locomotion over challenging terrain. At its core, a learned step-following policy executes footstep commands with varying positions, orientations, and heights. These commands are generated by a heuristic planner that processes user-defined velocity and orientation inputs alongside a local height field to adaptively adjust step placement and height. By leveraging two-step foresight commands for each leg, the controller ensures safe and adaptive locomotion in dynamic environments. A comparative study of different foresight horizons (one-step, two-step, and three-step) demonstrates that the two-step controller provides the best trade-off between performance and complexity. Critically, our method plans footsteps without relying on dynamic models, making it simpler to implement and deploy. We show through comparisons with a model-based approach that it achieves competitive performance even without explicit dynamics information. To handle non-flat terrain, our method adopts a modular architecture that decouples perception from control, in contrast to end-to-end approaches that map perception inputs directly to motor commands. Perceptual information is processed into footstep commands, which are then provided to the controller. This design improves interpretability, simplifies debugging, and increases safety by ensuring the low-level policy consistently receives physically feasible commands. Trained in simulation to navigate environments with randomly placed obstacles, the resulting policy demonstrates robust locomotion across diverse terrains. The robot can overcome obstacles up to 50% of its leg length while accurately tracking speeds of up to 0.8 m/s, as validated in simulation on Bruce, a kid-sized humanoid robot platform.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom