Reward Translation via Reward Machine in Semi-Alignable MDPs

  • Yun Hua
  • , Haosheng Chen
  • , Wenhao Li
  • , Bo Jin
  • , Baoxiang Wang
  • , Hongyuan Zha
  • , Xiangfeng Wang*
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graphmatching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semialignable MDPs.

Original languageEnglish
Pages (from-to)24912-24931
Number of pages20
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Fingerprint

Dive into the research topics of 'Reward Translation via Reward Machine in Semi-Alignable MDPs'. Together they form a unique fingerprint.

Cite this