TY - JOUR
T1 - Reward Translation via Reward Machine in Semi-Alignable MDPs
AU - Hua, Yun
AU - Chen, Haosheng
AU - Li, Wenhao
AU - Jin, Bo
AU - Wang, Baoxiang
AU - Zha, Hongyuan
AU - Wang, Xiangfeng
N1 - Publisher Copyright:
© 2020 by the authors.
PY - 2025
Y1 - 2025
N2 - Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graphmatching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semialignable MDPs.
AB - Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graphmatching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semialignable MDPs.
UR - https://www.scopus.com/pages/publications/105023577249
M3 - 会议文章
AN - SCOPUS:105023577249
SN - 2640-3498
VL - 267
SP - 24912
EP - 24931
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 42nd International Conference on Machine Learning, ICML 2025
Y2 - 13 July 2025 through 19 July 2025
ER -