An efficient actor‐critic reinforcement learning for device‐to‐device communication underlaying sectored cellular network | Zendy

Khuntia Pratap | Zendy; Hazra Ranjay | Zendy; Chong Peter | Zendy

Premium

An efficient actor‐critic reinforcement learning for device‐to‐device communication underlaying sectored cellular network

Author(s) -

Khuntia Pratap,

Hazra Ranjay,

Chong Peter

Publication year - 2020

Publication title -

international journal of communication systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.344

H-Index - 49

eISSN - 1099-1131

pISSN - 1074-5351

DOI - 10.1002/dac.4315

Subject(s) - computer science , reinforcement learning , cellular network , queueing theory , power control , resource allocation , channel (broadcasting) , base station , cluster analysis , transmission (telecommunications) , throughput , mathematical optimization , channel allocation schemes , computer network , power (physics) , distributed computing , wireless , telecommunications , artificial intelligence , physics , mathematics , quantum mechanics

Summary In this paper, a novel reinforcement learning (RL) approach with cell sectoring is proposed to solve the channel and power allocation issue for a device‐to‐device (D2D)‐enabled cellular network when the prior traffic information is not known to the base station (BS). Further, this paper explores an optimal policy for resource and power allocation between users intending to maximize the sum‐rate of the overall system. Since the behavior of wireless channel and traffic request of users in the system is stochastic in nature, the dynamic property of the environment allows us to employ an actor‐critic RL technique to learn the best policy through continuous interaction with the surrounding. The proposed work comprises of four phases: cell splitting, clustering, queuing model, and channel allocation and power allocation simultaneously using an actor‐critic RL. The implementation of cell splitting with novel clustering technique increases the network coverage, reduces co‐channel cell interference, and minimizes the transmission power of nodes, whereas the queuing model solves the issue of waiting time for users in a priority‐based data transmission. With the help of continuous state‐action space, the actor‐critic RL algorithm based on policy gradient improves the overall system sum‐rate as well as the D2D throughput. The actor adopts a parameter‐based stochastic policy for giving continuous action while the critic estimates the policy and criticizes the actor for the action. This reduces the high variance of the policy gradient. Through numerical simulations, the benefit of our resource sharing scheme over other existing traditional scheme is verified.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research