HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.

Original languageEnglish
Title of host publicationKDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages637-645
Number of pages9
ISBN (Electronic)9781450383325
DOIs
StatePublished - 14 Aug 2021
Event27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021 - Virtual, Online, Singapore
Duration: 14 Aug 202118 Aug 2021

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021
Country/TerritorySingapore
CityVirtual, Online
Period14/08/2118/08/21

Keywords

  • meta learning
  • reinforcement learning
  • sparse reward

Fingerprint

Dive into the research topics of 'HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem'. Together they form a unique fingerprint.

Cite this