InCLET: Large Language Model In-context Learning can Improve Embodied Instruction-following

  • Peng Yuan Wang
  • , Jing Cheng Pang
  • , Chen Yang Wang
  • , Xuhui Liu
  • , Tian Shuo Liu
  • , Si Hang Yang
  • , Hong Qian
  • , Yang Yu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Natural language-conditioned reinforcement learning (NLC-RL) empowers embodied agent to complete various tasks following human instruction. However, the unbounded natural language examples still introduce much complexity for the agent that solves concrete RL tasks, which can distract policy learning from completing the task. Consequently, extracting effective task representation from human instruction emerges as the critical component of NLC-RL. While previous methods have attempted to address this issue by learning task-related representation using large language models (LLMs), they highly rely on pre-collected task data and require extra training procedure. In this study, we uncover the inherent capability of LLMs to generate task representations and present a novel method, in-context learning embedding as task representation (InCLET). InCLET is grounded on a foundational finding that LLM in-context learning using trajectories can greatly help represent tasks. We thus firstly employ LLM to imagine task trajectories following the natural language instruction, then use in-context learning of LLM to generate task representations, and finally aggregate and project into a compact low-dimensional task representation. This representation is then used to train a human instruction-following agent. We conduct experiments on various embodied control environments and results show that InCLET creates effective task representations. Furthermore, this representation can significantly improve the RL training efficiency, compared to the baseline methods.

Original languageEnglish
Title of host publicationProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025
EditorsYevgeniy Vorobeychik, Sanmay Das, Ann Nowe
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages2134-2142
Number of pages9
ISBN (Electronic)9798400714269
StatePublished - 2025
Event24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025 - Detroit, United States
Duration: 19 May 202523 May 2025

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914

Conference

Conference24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025
Country/TerritoryUnited States
CityDetroit
Period19/05/2523/05/25

Keywords

  • Embodiment Agent
  • In-context Learning
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'InCLET: Large Language Model In-context Learning can Improve Embodied Instruction-following'. Together they form a unique fingerprint.

Cite this