TY - GEN
T1 - Formal Verification of Probabilistic Deep Reinforcement Learning Policies with Abstract Training
AU - Yang, Junfeng
AU - Zhang, Min
AU - Chen, Xin
AU - Li, Qin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Deep Reinforcement Learning (DRL), especially DRL with probabilistic policies, has shown great potential in learning control policies. In safety-critical domains, using probabilistic DRL policy requires strict safety assurances, making it critical to verify the probabilistic DRL policy formally. However, formal verification of probabilistic DRL policies still faces significant challenges. These challenges arise from the complexity of reasoning about the neural network’s probabilistic outputs for infinite state sets and the state explosion problem during model construction. This paper proposes a novel approach based on abstract training for quantitatively verifying probabilistic DRL policies. Specifically, we abstract the infinite continuous state space into finite discrete decision units and train a deep neural network (DNN) policy on these decision units. This abstract training allows for the direct black-box computation of probabilistic decision outputs for a set of states, greatly simplifying the complexity of reasoning neural network outputs. We further abstract the execution of the trained DNN policy as a Markov decision model (MDP) and perform probabilistic model checking, obtaining two types of upper bounds on the probability of being unsafe. When constructing the MDP, we incorporate the reuse of abstract states based on decision units, significantly alleviating the state explosion problem. Experiments show that the proposed probabilistic quantitative verification can yield tighter upper bounds on unsafe probabilities over longer time horizons more easily and efficiently than the current state-of-the-art method.
AB - Deep Reinforcement Learning (DRL), especially DRL with probabilistic policies, has shown great potential in learning control policies. In safety-critical domains, using probabilistic DRL policy requires strict safety assurances, making it critical to verify the probabilistic DRL policy formally. However, formal verification of probabilistic DRL policies still faces significant challenges. These challenges arise from the complexity of reasoning about the neural network’s probabilistic outputs for infinite state sets and the state explosion problem during model construction. This paper proposes a novel approach based on abstract training for quantitatively verifying probabilistic DRL policies. Specifically, we abstract the infinite continuous state space into finite discrete decision units and train a deep neural network (DNN) policy on these decision units. This abstract training allows for the direct black-box computation of probabilistic decision outputs for a set of states, greatly simplifying the complexity of reasoning neural network outputs. We further abstract the execution of the trained DNN policy as a Markov decision model (MDP) and perform probabilistic model checking, obtaining two types of upper bounds on the probability of being unsafe. When constructing the MDP, we incorporate the reuse of abstract states based on decision units, significantly alleviating the state explosion problem. Experiments show that the proposed probabilistic quantitative verification can yield tighter upper bounds on unsafe probabilities over longer time horizons more easily and efficiently than the current state-of-the-art method.
KW - Abstract training
KW - Formal quantitative verification
KW - Probabilistic deep reinforcement learning
KW - Safety-critical properties
UR - https://www.scopus.com/pages/publications/85218452323
U2 - 10.1007/978-3-031-82700-6_6
DO - 10.1007/978-3-031-82700-6_6
M3 - 会议稿件
AN - SCOPUS:85218452323
SN - 9783031826993
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 125
EP - 147
BT - Verification, Model Checking, and Abstract Interpretation - 26th International Conference, VMCAI 2025, Proceedings
A2 - Shankaranarayanan, Krishna
A2 - Sankaranarayanan, Sriram
A2 - Trivedi, Ashutosh
PB - Springer Science and Business Media Deutschland GmbH
T2 - 26th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI 2025
Y2 - 20 January 2025 through 21 January 2025
ER -