LLM-Driven Pareto-Optimal Multi-Mode Reinforcement Learning for Adaptive UAV Navigation in Urban Wind Environments
Author(s) -
Jiahao Wu,
Hengxu You,
Bowen Sun,
Jing Du
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3611336
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Autonomous drones in complex urban wind environments must balance speed, safety, and energy efficiency under highly variable conditions. Traditional single-policy reinforcement learning controllers often perform poorly when exposed to scenarios beyond their training. We introduce a Pareto-optimal multi-mode framework that trains three specialized unmanned aerial vehicle (UAV) policies (aggressive, balanced, and cautious) via proximal policy optimization (PPO) with specific reward scalings, yielding controllers that collectively span the speed-safety-energy trade-off surface. To automate mode selection, we fine-tune a large language model (LLM) on 30,000 simulation-derived environment-performance tuples, allowing it to predict the optimal policy from building density, wind speed and orientation, battery state, and recent flight history. In a Unity-based Manhattan simulation with computational fluid dynamics (CFD) wind fields across four headings and 10 speed levels, the LLM-driven decision maker reduces average flight time by 16%, lowers the collision rate by 50%, and saves 18% energy compared to any single mode, while preserving nondominated trade-off performance. The decision maker also generalizes to unseen wind patterns and layouts without handcrafted heuristics, demonstrating the promise of combining Pareto-optimal reinforcement learning (RL) with LLM-based meta-decision making for UAV autonomy.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom