TY - GEN
T1 - Safe Reinforcement Learning for CPSs via Formal Modeling and Verification
AU - Yang, Chenchen
AU - Liu, Jing
AU - Sun, Haiying
AU - Sun, Junfeng
AU - Chen, Xiang
AU - Zhang, Lipeng
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/18
Y1 - 2021/7/18
N2 - Reinforcement learning (RL) can be defined as the process of learning policies that maximize the expectation of the rewards. It has shown success in solving complex decision-making tasks. However, reinforcement learning-based controllers do not provide guarantees of safety of physical models in Cyber-physical systems (CPSs). In this paper, we propose a framework, which allows implementing RL to the safe control system by transforming formal analysis to learned policy. For satisfaction verification and quantitative analysis, we propose an uncertainty modeling language CSML to describe behaviors of the system, and transform CSML design into networks of probabilistic timed automata (NPTA). For safe learning, we present an algorithm called Safe Control with Formal Methods (SCFM). SCFM constructs a state set that obeys the constraint described by probabilistic computation tree logic (PCTL) via exploring state space before the learning process. The monitor monitors the system, determines whether the chosen action is safe and corrects unsafe decisions. We validate our method through experiments of lane-change control for autonomous cars.
AB - Reinforcement learning (RL) can be defined as the process of learning policies that maximize the expectation of the rewards. It has shown success in solving complex decision-making tasks. However, reinforcement learning-based controllers do not provide guarantees of safety of physical models in Cyber-physical systems (CPSs). In this paper, we propose a framework, which allows implementing RL to the safe control system by transforming formal analysis to learned policy. For satisfaction verification and quantitative analysis, we propose an uncertainty modeling language CSML to describe behaviors of the system, and transform CSML design into networks of probabilistic timed automata (NPTA). For safe learning, we present an algorithm called Safe Control with Formal Methods (SCFM). SCFM constructs a state set that obeys the constraint described by probabilistic computation tree logic (PCTL) via exploring state space before the learning process. The monitor monitors the system, determines whether the chosen action is safe and corrects unsafe decisions. We validate our method through experiments of lane-change control for autonomous cars.
KW - CSML
KW - Networks of probabilistic timed automata
KW - Probabilistic computation tree logic
KW - Reinforcement learning
KW - Safe control
UR - https://www.scopus.com/pages/publications/85116436716
U2 - 10.1109/IJCNN52387.2021.9533979
DO - 10.1109/IJCNN52387.2021.9533979
M3 - 会议稿件
AN - SCOPUS:85116436716
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 International Joint Conference on Neural Networks, IJCNN 2021
Y2 - 18 July 2021 through 22 July 2021
ER -