Safe Reinforcement Learning via Probabilistic Timed Computation Tree Logic

  • Li Qian
  • , Jing Liu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Reinforcement learning aims to discover an optimal policy that maximizes reward based on the feedback signal. Although the method succeeds in numerous systems, it may not apply to safe-critical systems due to the absence of safety protection mechanism. Besides, the agent is unable to model the environment accurately if getting biased observation. We present a safe algorithm called Safe Control with Supervisor (SCS) for addressing the limitation. If the model is accurate, the supervisor monitors the system and repairs the action of the agent at runtime, which guides the system to obey the specification described by probabilistic timed Computation Tree Logic (ptCTL). If not, the supervisor would maximize the probability of satisfying a given task specification. We validate our method through experiments of adaptive cruise control under uncertainty.

Original languageEnglish
Title of host publication2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169262
DOIs
StatePublished - Jul 2020
Event2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom
Duration: 19 Jul 202024 Jul 2020

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2020 International Joint Conference on Neural Networks, IJCNN 2020
Country/TerritoryUnited Kingdom
CityVirtual, Glasgow
Period19/07/2024/07/20

Keywords

  • Probabilistic timed computation tree logic
  • Reinforcement learning
  • Safe control

Fingerprint

Dive into the research topics of 'Safe Reinforcement Learning via Probabilistic Timed Computation Tree Logic'. Together they form a unique fingerprint.

Cite this