Dependable Reinforcement Learning via Timed Differential Dynamic Logic

  • Runhao Wang
  • , Yuhong Zhang
  • , Haiying Sun
  • , Jing Liu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reinforcement learning algorithms discover policies that are lauded for their high efficiency, but don't necessarily guarantee safety. We introduce a new approach that provides the best of both worlds: learning optimal policies while enforcing the system to comply with certain model to keep the learning dependable. To this end, we propose Timed Differential Dynamic Logic to express the system properties. Our main insight is to convert the properties to runtime monitors, and use them to monitor whether the system is correctly modeled. We choose the optimal polices only if the reality matches the model, or we will abandon efficiency and instead to choose a policy that guides the agent to a modeled portion of the state space. We also propose Dependable Mixed Control (DMC) algorithm to implement a framework for application. Finally, the effectiveness of our approach is validated through a case study on Communication-Based Autonomous Control (CBAC).

Original languageEnglish
Title of host publication26th IEEE Symposium on Computers and Communications, ISCC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665427449
DOIs
StatePublished - 2021
Event26th IEEE Symposium on Computers and Communications, ISCC 2021 - Athens, Greece
Duration: 5 Sep 20218 Sep 2021

Publication series

NameProceedings - IEEE Symposium on Computers and Communications
Volume2021-September
ISSN (Print)1530-1346

Conference

Conference26th IEEE Symposium on Computers and Communications, ISCC 2021
Country/TerritoryGreece
CityAthens
Period5/09/218/09/21

Keywords

  • Reinforcement learning
  • Safe control
  • Timed Differential Dynamic Logic

Fingerprint

Dive into the research topics of 'Dependable Reinforcement Learning via Timed Differential Dynamic Logic'. Together they form a unique fingerprint.

Cite this