Premium
An efficient actor‐critic reinforcement learning for device‐to‐device communication underlaying sectored cellular network
Author(s) -
Khuntia Pratap,
Hazra Ranjay,
Chong Peter
Publication year - 2020
Publication title -
international journal of communication systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.344
H-Index - 49
eISSN - 1099-1131
pISSN - 1074-5351
DOI - 10.1002/dac.4315
Subject(s) - computer science , reinforcement learning , cellular network , queueing theory , power control , resource allocation , channel (broadcasting) , base station , cluster analysis , transmission (telecommunications) , throughput , mathematical optimization , channel allocation schemes , computer network , power (physics) , distributed computing , wireless , telecommunications , artificial intelligence , physics , mathematics , quantum mechanics
Summary In this paper, a novel reinforcement learning (RL) approach with cell sectoring is proposed to solve the channel and power allocation issue for a device‐to‐device (D2D)‐enabled cellular network when the prior traffic information is not known to the base station (BS). Further, this paper explores an optimal policy for resource and power allocation between users intending to maximize the sum‐rate of the overall system. Since the behavior of wireless channel and traffic request of users in the system is stochastic in nature, the dynamic property of the environment allows us to employ an actor‐critic RL technique to learn the best policy through continuous interaction with the surrounding. The proposed work comprises of four phases: cell splitting, clustering, queuing model, and channel allocation and power allocation simultaneously using an actor‐critic RL. The implementation of cell splitting with novel clustering technique increases the network coverage, reduces co‐channel cell interference, and minimizes the transmission power of nodes, whereas the queuing model solves the issue of waiting time for users in a priority‐based data transmission. With the help of continuous state‐action space, the actor‐critic RL algorithm based on policy gradient improves the overall system sum‐rate as well as the D2D throughput. The actor adopts a parameter‐based stochastic policy for giving continuous action while the critic estimates the policy and criticizes the actor for the action. This reduces the high variance of the policy gradient. Through numerical simulations, the benefit of our resource sharing scheme over other existing traditional scheme is verified.