
Comparison of optimized Markov Decision Process using Dynamic Programming and Temporal Differencing – A reinforcement learning approach
Author(s) -
Annapoorni Mani,
Shahriman Abu Bakar,
Pranesh Krishnan,
Sazali Yaacob
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2107/1/012026
Subject(s) - scrap , reinforcement learning , markov decision process , computer science , process (computing) , raw material , dynamic programming , statistical process control , markov process , control (management) , quality (philosophy) , raw data , machine learning , artificial intelligence , engineering , mathematics , algorithm , statistics , mechanical engineering , chemistry , organic chemistry , operating system , philosophy , programming language , epistemology
Reinforcement learning is one of the promising approaches for operations research problems. The incoming inspection process in any manufacturing plant aims to control quality, reduce manufacturing costs, eliminate scrap, and process failure downtimes due to non-conforming raw materials. Prediction of the raw material acceptance rate can regulate the raw material supplier selection and improve the manufacturing process by filtering out non-conformities. This paper presents a Markov model developed to estimate the probability of the raw material being accepted or rejected in an incoming inspection environment. The proposed forecasting model is further optimized for efficiency using the two reinforcement learning algorithms (dynamic programming and temporal differencing). The results of the two optimized models are compared, and the findings are discussed.