Hierarchical Diffusion for Offline Decision Making

Wenhao Li, Xiangfeng Wang, Bo Jin, Hongyuan Zha

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations

Abstract

Offline reinforcement learning typically introduces a hierarchical structure to solve the long-horizon problem so as to address its thorny issue of variance accumulation. Problems of deadly triad, limited data and reward sparsity, however, still remain, rendering the design of effective, hierarchical offline RL algorithms for general-purpose policy learning a formidable challenge. In this paper, we first formulate the problem of offline long-horizon decision-MakIng from the perspective of conditional generative modeling by incorporating goals into the control-as-inference graphic models. A Hierarchical trajectory-level Diffusion probabilistic model is then proposed with classifier-free guidance. HDMI employs a cascade framework that utilizes the reward-conditional goal diffuser for the subgoal discovery and the goal-conditional trajectory diffuser for generating the corresponding action sequence of subgoals. Planning-based subgoal extraction and transformer-based diffusion are employed to deal with the sub-optimal data pollution and long-range subgoal dependencies in the goal diffusion. Numerical experiments verify the advantages of HDMI on long-horizon decision-making compared to SOTA offline RL methods and conditional generative models.

Original languageEnglish
Pages (from-to)19425-19439
Number of pages15
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023

Fingerprint

Dive into the research topics of 'Hierarchical Diffusion for Offline Decision Making'. Together they form a unique fingerprint.

Cite this