
Data-Efficient MADDPG Based on Self-Attention for IoT Energy Management Systems
Author(s) -
Mohammed Al-Saffar,
Mustafa Gul
Publication year - 2023
Publication title -
ieee access
Language(s) - English
Resource type - Journals
ISSN - 2169-3536
DOI - 10.1109/access.2023.3322193
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In this study, the simulated real-world Demand Response (DR) potential is controlled and optimized when household load characteristics are analyzed based on historical data information. To determine the optimal DR potential in smart homes integrated with IoT energy management systems, a multi-agent reinforcement learning framework can be one of the best solutions to handle various household appliances’ control activities associated with stochastic nature. However, the main problem with multi-agent systems is a nonstationary environment that is arisen by the agents. Consequently, this can cause more system uncertainties. Hence, it requires an excessive number of interactions with the environment for training which leads to a data inefficient reinforcement learning model. Thus, we propose a new approach using a Multi-Agent Deep Deterministic Policy Gradient based on Bi-directional Long Short Term Memory and Attention Mechanism (BiLSTMA-MADDPG) to extract more useful information. Therefore, we developed an improved MADDPG model that exploits the BiLSTM layer to store a history of experience in the MADDPG’s replay buffer, and the Attention Mechanism to reduce the model dependency upon the number of samples since it can extract the most valuable data and ignore the less important ones. In this way, BiLSTMA-MADDPG can perform better than the conventional MADDPG even with the small sample environment to motivate the exploration of a more robust and data-efficient regime. Therefore, the attention mechanism enables MADDPG to be more effective and scalable in learning in complex real-world multi-agent environments. Simulation results are obtained for a household environment with three cooperated agents to control the following devices, washing machine, air conditioner, and electric vehicle. The model performance is validated, showing an improvement to the data efficiency and convergence speed, and a promise for a real-life application in terms of appliance energy consumption.