Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Steven Carr; Nils Jansen; Ufuk Topcu

doi:10.1613/jair.1.12963

PDF

Published: Nov 18, 2021

DOI: https://doi.org/10.1613/jair.1.12963

Keywords:

markov decision processes, neural networks, uncertainty

Steven Carr

The University of Texas at Austin

Nils Jansen

Radboud University, Nijmegen, The Netherlands

Ufuk Topcu

The University of Texas at Austin

Abstract

Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.

Issue

Vol. 72 (2021)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details