A study of multiple reward function performances for vehicle collision avoidance systems applying the DQN algorithm in reinforcement learning | Zendy

Nadia Zakaria | Zendy; M I Shapiai | Zendy; Nurbaiti Wahid | Zendy

Open Access

A study of multiple reward function performances for vehicle collision avoidance systems applying the DQN algorithm in reinforcement learning

Author(s) -

Nadia Zakaria,

M I Shapiai,

Nurbaiti Wahid

Publication year - 2021

Publication title -

iop conference series. materials science and engineering

Language(s) - English

Resource type - Journals

eISSN - 1757-899X

pISSN - 1757-8981

DOI - 10.1088/1757-899x/1176/1/012033

Subject(s) - reinforcement learning , computer science , range (aeronautics) , function (biology) , convergence (economics) , collision avoidance , collision , q learning , value (mathematics) , bellman equation , artificial intelligence , mathematical optimization , machine learning , mathematics , engineering , evolutionary biology , economic growth , economics , biology , aerospace engineering , computer security

Reinforcement Learning (RL) is an area of Machine Learning (ML) that intends to improve the acts of agents learning from environmental interconnection. The significant concern in RL is to achieve the promising potential of the training process in the model. However, network convergence speed is often sluggish in RL and converges quickly to local optimal solutions. Reward function has been used to deal with these problems as a useful tool to speed up the agent’s learning speed. Even though RL convergence properties have been comprehensively explored, there are no specific rules for choosing the reward function. Therefore, searching for efficient potential reward function is still an exciting field of study. This paper discusses the reward function, execute some analysis, and provides the learning agent with the extracted information to increase the speed of learning for collision avoidance task. We provide an experimental study for selecting one reward function in a simulated collision-avoidance environment of an autonomous vehicle by applying the DQN algorithm. It has been conducted on online environments, which is using the CARLA simulator. This experimental study consists of three cases with a various exploration of reward values. Case 1 consists of the range of the penalty value larger than the reward function by 200 times. Case 2 is similar, but with the small range number of the penalty is applied, case 3, which is the reward function and penalty value, is in the same range value. The result shows that case 3 performances outperform case 1 and case 2 with 94% average accuracy; meanwhile, case 1 obtains 70%, and case 2 achieves 85% accuracy. It is may due to the monumental size of the collision penalty in comparison to all else. Hence, the findings obtained show the efficacy of the exploration of the reward function.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore