What Matters in Learning a Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study | Zendy

Jiayu Chen | Zendy; Chao Yu | Zendy; Yuqing Xie | Zendy; Feng Gao | Zendy; Yinuo Chen | Zendy; Shu'ang Yu | Zendy; Wenhao Tang | Zendy; Shilong Ji | Zendy; Mo Mu | Zendy; Yi Wu | Zendy; Huazhong Yang | Zendy; Yu Wang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

What Matters in Learning a Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

Author(s) -

Jiayu Chen,

Chao Yu,

Yuqing Xie,

Feng Gao,

Yinuo Chen,

Shu'ang Yu,

Wenhao Tang,

Shilong Ji,

Mo Mu,

Yi Wu,

Huazhong Yang,

Yu Wang

Publication year - 2025

Publication title -

ieee robotics and automation letters

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 1.123

H-Index - 56

eISSN - 2377-3766

DOI - 10.1109/lra.2025.3575011

Subject(s) - robotics and control systems , computing and processing , components, circuits, devices and systems

Precise and agile flight maneuvers are essential for quadrotor applications, yet traditional control methods are limited by their reliance on flat trajectories or computationally intensive optimization. Reinforcement learning (RL)-based policies offer a promising alternative by directly mapping observations to actions, reducing dependency on system knowledge and actuation constraints. However, the sim-to-real gap remains a significant challenge, often causing instability in real-world deployments. In this work, we identify five key factors for learning robust RL-based control policies capable of zero-shot real-world deployment: (1) integrating velocity and rotation matrix into actor inputs, (2) incorporating time vector into critic inputs, (3) regularizing action differences for smoothness, (4) applying system identification with selective randomization, and (5) using large batch sizes during training. Based on these insights, we develop SimpleFlight , a PPO-based framework that integrates these techniques. Extensive experiments on the Crazyflie quadrotor demonstrate that SimpleFlight reduces trajectory tracking error by over 50% compared to state-of-the-art RL baselines. It excels in both smooth polynomial and challenging infeasible zigzag trajectories, particularly on small thrust-to-weight quadrotors, where baseline methods often fail. To enhance reproducibility and further research, we integrate SimpleFlight into the GPU-based Omnidrones simulator and provide open-source code and model checkpoints.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research