跳到主要导航 跳到搜索 跳到主要内容

Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning

  • Ziang Liu
  • , Zihao Zhang
  • , Xin Li
  • , Xingjiao Wu
  • , Mei Xue*
  • *此作品的通讯作者
  • East China Normal University
  • Shanghai AI Laboratory
  • Shanghai University of Electric Power

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Preference-based reinforcement learning (PBRL) algorithms train intelligent agents efficiently by learning reward functions from human preferences, bypassing the need for costly pre-existing reward functions. However, prior PBRL research has predominantly relied on simulated teachers to simulate human preferences, overlooking the absence of simulated teachers in unresolved real-world problems. To effectively apply PBRL to real-world problems, it is essential to investigate the distinctions between human teachers and simulated teachers in terms of the preference selection patterns and the behaviors exhibited by the agents. Therefore, we propose HPBRL, a novel Human Preference-Based Reinforcement Learning Collaboration prototype, in which the agent learns a flexible reward function from real human preferences. To facilitate a comprehensive comparison between human teachers and simulated teachers, we conduct an in-depth analysis through a between-subjects study involving 18 users.

源语言英语
主期刊名McGE 2025 - Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation
主期刊副标题New Methods and Practice, Co-Located with MM 2025
出版商Association for Computing Machinery, Inc
45-53
页数9
ISBN(电子版)9798400720604
DOI
出版状态已出版 - 26 10月 2025
活动3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, McGE 2025 - Dublin, 爱尔兰
期限: 31 10月 202531 10月 2025

出版系列

姓名McGE 2025 - Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, Co-Located with MM 2025

会议

会议3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, McGE 2025
国家/地区爱尔兰
Dublin
时期31/10/2531/10/25

指纹

探究 'Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此