TY - GEN
T1 - Decomposition and Foresight
T2 - 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, McGE 2025
AU - Liu, Ziang
AU - Zhang, Zihao
AU - Li, Xin
AU - Wu, Xingjiao
AU - Xue, Mei
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/10/26
Y1 - 2025/10/26
N2 - Preference-based reinforcement learning (PBRL) algorithms train intelligent agents efficiently by learning reward functions from human preferences, bypassing the need for costly pre-existing reward functions. However, prior PBRL research has predominantly relied on simulated teachers to simulate human preferences, overlooking the absence of simulated teachers in unresolved real-world problems. To effectively apply PBRL to real-world problems, it is essential to investigate the distinctions between human teachers and simulated teachers in terms of the preference selection patterns and the behaviors exhibited by the agents. Therefore, we propose HPBRL, a novel Human Preference-Based Reinforcement Learning Collaboration prototype, in which the agent learns a flexible reward function from real human preferences. To facilitate a comprehensive comparison between human teachers and simulated teachers, we conduct an in-depth analysis through a between-subjects study involving 18 users.
AB - Preference-based reinforcement learning (PBRL) algorithms train intelligent agents efficiently by learning reward functions from human preferences, bypassing the need for costly pre-existing reward functions. However, prior PBRL research has predominantly relied on simulated teachers to simulate human preferences, overlooking the absence of simulated teachers in unresolved real-world problems. To effectively apply PBRL to real-world problems, it is essential to investigate the distinctions between human teachers and simulated teachers in terms of the preference selection patterns and the behaviors exhibited by the agents. Therefore, we propose HPBRL, a novel Human Preference-Based Reinforcement Learning Collaboration prototype, in which the agent learns a flexible reward function from real human preferences. To facilitate a comprehensive comparison between human teachers and simulated teachers, we conduct an in-depth analysis through a between-subjects study involving 18 users.
KW - human-centered computing
KW - preference-based reinforcement learning
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105028990754
U2 - 10.1145/3746278.3759381
DO - 10.1145/3746278.3759381
M3 - 会议稿件
AN - SCOPUS:105028990754
T3 - McGE 2025 - Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, Co-Located with MM 2025
SP - 45
EP - 53
BT - McGE 2025 - Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation
PB - Association for Computing Machinery, Inc
Y2 - 31 October 2025 through 31 October 2025
ER -