Formal Verification of Probabilistic Deep Reinforcement Learning Policies with Abstract Training

Junfeng Yang, Min Zhang, Xin Chen, Qin Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Deep Reinforcement Learning (DRL), especially DRL with probabilistic policies, has shown great potential in learning control policies. In safety-critical domains, using probabilistic DRL policy requires strict safety assurances, making it critical to verify the probabilistic DRL policy formally. However, formal verification of probabilistic DRL policies still faces significant challenges. These challenges arise from the complexity of reasoning about the neural network’s probabilistic outputs for infinite state sets and the state explosion problem during model construction. This paper proposes a novel approach based on abstract training for quantitatively verifying probabilistic DRL policies. Specifically, we abstract the infinite continuous state space into finite discrete decision units and train a deep neural network (DNN) policy on these decision units. This abstract training allows for the direct black-box computation of probabilistic decision outputs for a set of states, greatly simplifying the complexity of reasoning neural network outputs. We further abstract the execution of the trained DNN policy as a Markov decision model (MDP) and perform probabilistic model checking, obtaining two types of upper bounds on the probability of being unsafe. When constructing the MDP, we incorporate the reuse of abstract states based on decision units, significantly alleviating the state explosion problem. Experiments show that the proposed probabilistic quantitative verification can yield tighter upper bounds on unsafe probabilities over longer time horizons more easily and efficiently than the current state-of-the-art method.

Original languageEnglish
Title of host publicationVerification, Model Checking, and Abstract Interpretation - 26th International Conference, VMCAI 2025, Proceedings
EditorsKrishna Shankaranarayanan, Sriram Sankaranarayanan, Ashutosh Trivedi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages125-147
Number of pages23
ISBN (Print)9783031826993
DOIs
StatePublished - 2025
Event26th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI 2025 - Denver, United States
Duration: 20 Jan 202521 Jan 2025

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15529 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI 2025
Country/TerritoryUnited States
CityDenver
Period20/01/2521/01/25

Keywords

  • Abstract training
  • Formal quantitative verification
  • Probabilistic deep reinforcement learning
  • Safety-critical properties

Fingerprint

Dive into the research topics of 'Formal Verification of Probabilistic Deep Reinforcement Learning Policies with Abstract Training'. Together they form a unique fingerprint.

Cite this