Distributed Reinforcement Learning in Emergency Response Simulation
Author(s) -
Cesar Lopez,
Jose R. Marti,
Sarbjit Sarkaria
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2878894
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
This paper presents the implementation of a coordinated decision-making agent for emergency response scenarios. The agent’s implementation uses reinforcement learning (RL). RL is a machine learning technique that enables an agent to learn from experimenting. The agent’s learning is based on rewards, and feedback signals proportional to how good its actions are. The simulation platform used was infrastructure interdependencies simulator, in which, we have tested suitability of the approach in previous studies. In this paper, we have added new features to our previous solution, for enabling faster convergence and distributed processing. These additions include an enhanced reward scheme and a scheduler for orchestrating the distributed training. We include two test cases. The first case is a compact model with four critical infrastructures. In this model, the agent’s training required only 10% of the attempts needed in our previous version. Improvements in convergence come from adding a shaping reward scheme. We trained the agent across 24 simultaneous configurations of our model. The training process elapsed 4 min. The extended case included more infrastructures and a higher level of detail. The dimensionality of the problem grew by a factor of 4000, but the training converged in less episodes. We tested the extended model over 96 parallel instances (potential scenarios) with completion in 2.87 min. The results show a fast and stable convergence. This agent can help during multiple stages of emergency response including real-time situations.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom