Achieving Safe Deep Reinforcement Learning via Environment Comprehension Mechanism | Zendy

Pai PENG | Zendy; Fei ZHU | Zendy; Quan LIU | Zendy; Peiyao ZHAO | Zendy; Wen WU | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Achieving Safe Deep Reinforcement Learning via Environment Comprehension Mechanism

Author(s) -

Pai PENG,

Fei ZHU,

Quan LIU,

Peiyao ZHAO,

Wen WU

Publication year - 2021

Publication title -

chinese journal of electronics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.267

H-Index - 25

eISSN - 2075-5597

pISSN - 1022-4653

DOI - 10.1049/cje.2021.07.025

Subject(s) - reinforcement learning , safer , computer science , artificial intelligence , markov decision process , stability (learning theory) , reinforcement , control (management) , comprehension , object (grammar) , process (computing) , mechanism (biology) , machine learning , markov process , computer security , engineering , mathematics , programming language , statistics , philosophy , structural engineering , epistemology , operating system

Deep reinforcement learning (DRL), which combines deep learning with reinforcement learning, has achieved great success recently. In some cases, however, during the learning process agents may reach states that are worthless and dangerous where the task fails. To address the problem, we propose an algorithm, referred as Environment comprehension mechanism (ECM) for deep reinforcement learning to attain safer decisions. ECM perceives hidden dangerous situations by analyzing object and comprehending the environment, such that the agent bypasses inappropriate actions systematically by setting up constraints dynamically according to states. ECM, which calculates the gradient of the states in Markov tuple, sets up boundary conditions and generates a rule to control the direction of the agent to skip unsafe states. ECM is able to be applied to basic deep reinforcement learning algorithms to guide the selection of actions. The experiment results show that the algorithm promoted safety and stability of the control tasks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore