Open Access
UAV‐enabled computation migration for complex missions: A reinforcement learning approach
Author(s) -
Zhu Shichao,
Gui Lin,
Cheng Nan,
Zhang Qi,
Sun Fei,
Lang Xiupu
Publication year - 2020
Publication title -
iet communications
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.355
H-Index - 62
eISSN - 1751-8636
pISSN - 1751-8628
DOI - 10.1049/iet-com.2019.1188
Subject(s) - reinforcement learning , computer science , markov decision process , benchmark (surveying) , task (project management) , computation , enhanced data rates for gsm evolution , distributed computing , partially observable markov decision process , convergence (economics) , edge computing , process (computing) , markov chain , real time computing , markov process , artificial intelligence , machine learning , markov model , algorithm , statistics , mathematics , management , geodesy , economic growth , economics , geography , operating system
The implementationof computation offloading is a challenging issue in the remote areas where traditional edge infrastructures are sparsely deployed. In this study, the authors propose a unmanned aerial vehicle (UAV)‐enabled edge computing framework, where a group of UAVs fly around to provide the near‐users edge computing service. They study the computation migration problem for the complex missions, which can be decomposed as some typical task‐flows considering the inter‐dependency of tasks. Each time a task appears, it should be allocated to a proper UAV for execution, which is defined as the computation migration or task migration. Since the UAV‐ground communication data rate is strongly associated with the UAV location, selecting a proper UAV to execute each task will largely benefit the missions response time. They formulate the computation migration decision making problem as a Markov decision process, in which the state contains the extracted observations from the environment. To cope with the dynamics of the environment, they propose an advantage actor–critic reinforcement learning approach to learn the near‐optimal policy on‐the‐fly. Simulation results show that the proposed approach has a desirable convergence property, and can significantly reduce the average response time of missions compared with the benchmark greedy method.