Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes | Zendy

Steven Carr | Zendy; Nils Jansen | Zendy; Ufuk Topcu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Author(s) -

Steven Carr,

Nils Jansen,

Ufuk Topcu

Publication year - 2021

Publication title -

journal of artificial intelligence research/the journal of artificial intelligence research

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.79

H-Index - 123

eISSN - 1943-5037

pISSN - 1076-9757

DOI - 10.1613/jair.1.12963

Subject(s) - partially observable markov decision process , computer science , markov decision process , recurrent neural network , temporal logic , markov chain , artificial intelligence , probabilistic logic , markov process , machine learning , markov model , theoretical computer science , artificial neural network , mathematics , statistics

Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore