Skip to main navigation Skip to search Skip to main content

Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning

  • Ziang Liu
  • , Zihao Zhang
  • , Xin Li
  • , Xingjiao Wu
  • , Mei Xue*
  • *Corresponding author for this work
  • East China Normal University
  • Shanghai AI Laboratory
  • Shanghai University of Electric Power

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Preference-based reinforcement learning (PBRL) algorithms train intelligent agents efficiently by learning reward functions from human preferences, bypassing the need for costly pre-existing reward functions. However, prior PBRL research has predominantly relied on simulated teachers to simulate human preferences, overlooking the absence of simulated teachers in unresolved real-world problems. To effectively apply PBRL to real-world problems, it is essential to investigate the distinctions between human teachers and simulated teachers in terms of the preference selection patterns and the behaviors exhibited by the agents. Therefore, we propose HPBRL, a novel Human Preference-Based Reinforcement Learning Collaboration prototype, in which the agent learns a flexible reward function from real human preferences. To facilitate a comprehensive comparison between human teachers and simulated teachers, we conduct an in-depth analysis through a between-subjects study involving 18 users.

Original languageEnglish
Title of host publicationMcGE 2025 - Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation
Subtitle of host publicationNew Methods and Practice, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages45-53
Number of pages9
ISBN (Electronic)9798400720604
DOIs
StatePublished - 26 Oct 2025
Event3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, McGE 2025 - Dublin, Ireland
Duration: 31 Oct 202531 Oct 2025

Publication series

NameMcGE 2025 - Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, Co-Located with MM 2025

Conference

Conference3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, McGE 2025
Country/TerritoryIreland
CityDublin
Period31/10/2531/10/25

Keywords

  • human-centered computing
  • preference-based reinforcement learning
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning'. Together they form a unique fingerprint.

Cite this