FLIP: Adaptive Comparison Method Selection for Efficient Preference-Based Reinforcement Learning

  • Ziang Liu
  • , Xingjiao Wu*
  • , Hongxin Chen
  • , Luwei Xiao
  • , Jing Yang*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Preference-based Reinforcement Learning (PBRL) relies on the efficient collection and use of preference data to train accurate reward functions, enabling agents to learn directly from human preferences. This process allows agents to better understand human intentions while effectively reducing biases inherent in AI systems. The pairwise comparison method gathers diverse preference data, and Seqrank expands preference datasets through transitivity, both fail to establish preference relationships across different rounds of labeling. This limitation can result in fragmented signals and slow convergence toward the optimal policy. To address this, we propose the Global Tree (GTree), a method built on the Seqrank framework that integrates trajectory preferences across multiple rounds, providing a unified representation of global preferences. Moreover, we posit that different trajectory comparison methods offer distinct advantages depending on the task and the stage of training. To fully exploit these strengths, we introduce FLIP. This adaptive strategy dynamically selects either the pairwise method or GTree based on historical performance, optimizing method use for each task and training stage. Our evaluations demonstrate that integrating cross-round preferences accelerates the convergence of the reward function, while the FLIP strategy further enhances learning efficiency and overall performance, thereby enabling agents to better understand human intentions.

Original languageEnglish
Title of host publicationInternational Joint Conference on Neural Networks, IJCNN 2025 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331510428
DOIs
StatePublished - 2025
Event2025 International Joint Conference on Neural Networks, IJCNN 2025 - Rome, Italy
Duration: 30 Jun 20255 Jul 2025

Publication series

NameProceedings of the International Joint Conference on Neural Networks
ISSN (Print)2161-4393
ISSN (Electronic)2161-4407

Conference

Conference2025 International Joint Conference on Neural Networks, IJCNN 2025
Country/TerritoryItaly
CityRome
Period30/06/255/07/25

Keywords

  • Human-AI Interaction
  • Preference-based Reinforcement Learning
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'FLIP: Adaptive Comparison Method Selection for Efficient Preference-Based Reinforcement Learning'. Together they form a unique fingerprint.

Cite this