Compositional Shield Synthesis for Safe Reinforcement Learning in Partial Observability | Zendy

Steven Carr | Zendy; Georgios Bakirtzis | Zendy; Ufuk Topcu | Zendy

AI Assistant Blog Pricing

Open Access

Compositional Shield Synthesis for Safe Reinforcement Learning in Partial Observability

Author(s) -

Steven Carr,

Georgios Bakirtzis,

Ufuk Topcu

Publication year - 2025

Publication title -

ieee open journal of control systems

Language(s) - English

Resource type - Journals

ISSN - 2694-085X

DOI - 10.1109/ojcsys.2025.3611725

Subject(s) - robotics and control systems

Agents controlled by the output of reinforcement learning (RL) algorithms often transition to unsafe states, particularly in uncertain and partially observable environments. Partially observable Markov decision processes (POMDPs) provide a natural setting for studying such scenarios with limited sensing. Shields filter undesirable actions to ensure safe RL by preserving safety requirements in the agents' policy. However, synthesizing holistic shields is computationally expensive in complex deployment scenarios. We propose the compositional synthesis of shields by modeling safety requirements by parts, thereby improving scalability. In particular, problem formulations in the form of POMDPs using RL algorithms illustrate that an RL agent equipped with the resulting compositional shielding, beyond being safe, converges to higher values of expected reward. By using subproblem formulations, we preserve and improve the ability of shielded agents to require fewer training episodes than unshielded agents, especially in sparse-reward settings. Concretely, we find that compositional shield synthesis allows an RL agent to remain safe in environments two orders of magnitude larger than other state-of-the-art model-based approaches.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog