Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm | Zendy

Nesma M. Ashraf | Zendy; Reham R. Mostafa | Zendy; Rasha H. Sakr | Zendy; M. Z. Rashad | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm

Author(s) -

Nesma M. Ashraf,

Reham R. Mostafa,

Rasha H. Sakr,

M. Z. Rashad

Publication year - 2021

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0252754

Subject(s) - hyperparameter , reinforcement learning , computer science , artificial intelligence , machine learning , process (computing) , adaptation (eye) , mathematical optimization , algorithm , mathematics , physics , optics , operating system

Deep Reinforcement Learning (DRL) enables agents to make decisions based on a well-designed reward function that suites a particular environment without any prior knowledge related to a given environment. The adaptation of hyperparameters has a great impact on the overall learning process and the learning processing times. Hyperparameters should be accurately estimated while training DRL algorithms, which is one of the key challenges that we attempt to address. This paper employs a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem. DDPG is capable of handling complex environments, which contain continuous spaces for actions. To evaluate the proposed algorithm, the Open Racing Car Simulator (TORCS), a realistic autonomous driving simulation environment, was chosen to its ease of design and implementation. Using TORCS, the DDPG agent with optimized hyperparameters was compared with a DDPG agent with reference hyperparameters. The experimental results showed that the DDPG’s hyperparameters optimization leads to maximizing the total rewards, along with testing episodes and maintaining a stable driving policy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore