
A study of multiple reward function performances for vehicle collision avoidance systems applying the DQN algorithm in reinforcement learning
Author(s) -
Nadia Zakaria,
M I Shapiai,
Nurbaiti Wahid
Publication year - 2021
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/1176/1/012033
Subject(s) - reinforcement learning , computer science , range (aeronautics) , function (biology) , convergence (economics) , collision avoidance , collision , q learning , value (mathematics) , bellman equation , artificial intelligence , mathematical optimization , machine learning , mathematics , engineering , evolutionary biology , economic growth , economics , biology , aerospace engineering , computer security
Reinforcement Learning (RL) is an area of Machine Learning (ML) that intends to improve the acts of agents learning from environmental interconnection. The significant concern in RL is to achieve the promising potential of the training process in the model. However, network convergence speed is often sluggish in RL and converges quickly to local optimal solutions. Reward function has been used to deal with these problems as a useful tool to speed up the agent’s learning speed. Even though RL convergence properties have been comprehensively explored, there are no specific rules for choosing the reward function. Therefore, searching for efficient potential reward function is still an exciting field of study. This paper discusses the reward function, execute some analysis, and provides the learning agent with the extracted information to increase the speed of learning for collision avoidance task. We provide an experimental study for selecting one reward function in a simulated collision-avoidance environment of an autonomous vehicle by applying the DQN algorithm. It has been conducted on online environments, which is using the CARLA simulator. This experimental study consists of three cases with a various exploration of reward values. Case 1 consists of the range of the penalty value larger than the reward function by 200 times. Case 2 is similar, but with the small range number of the penalty is applied, case 3, which is the reward function and penalty value, is in the same range value. The result shows that case 3 performances outperform case 1 and case 2 with 94% average accuracy; meanwhile, case 1 obtains 70%, and case 2 achieves 85% accuracy. It is may due to the monumental size of the collision penalty in comparison to all else. Hence, the findings obtained show the efficacy of the exploration of the reward function.