TY - GEN
T1 - MindChat-R0
T2 - 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp Companion 2025
AU - She, Dong
AU - Zhang, Chenxu
AU - Yao, Xianrong
AU - Gao, Yang
AU - Jin, Zhanpeng
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/12/29
Y1 - 2025/12/29
N2 - Emotional Support Conversation (ESC) systems are critical for assisting individuals facing mental health challenges. In this work, we present a reinforcement learning framework to improve ESC systems through structured emotional reasoning. We first collect and clean a dataset of 4,500 real-world support-seeking posts. To guide emotional generation, we introduce Empathetic Chain-of-Thought(ECoT), a structured reasoning format that encourages multi-turn empathy and coherence. Based on this, we train MindChat-R0 (Qwen3-8B as basic model), a Chinese empathetic dialogue agent, using reinforcement learning optimized by ECoT-driven reward signals. LLM-as-a-judge evaluation shows that MindChat achieves the highest average score of 3.863 out of 5.0 across fluency, empathy, and support dimensions (vs. 2.834 for Qwen3-8B-nothink and 2.547 for Qwen3-8B-think). In human preference evaluation, MindChat-R0 also outperforms strong baselines with a win rate of 71.14%, based on pairwise comparisons by human annotators.
AB - Emotional Support Conversation (ESC) systems are critical for assisting individuals facing mental health challenges. In this work, we present a reinforcement learning framework to improve ESC systems through structured emotional reasoning. We first collect and clean a dataset of 4,500 real-world support-seeking posts. To guide emotional generation, we introduce Empathetic Chain-of-Thought(ECoT), a structured reasoning format that encourages multi-turn empathy and coherence. Based on this, we train MindChat-R0 (Qwen3-8B as basic model), a Chinese empathetic dialogue agent, using reinforcement learning optimized by ECoT-driven reward signals. LLM-as-a-judge evaluation shows that MindChat achieves the highest average score of 3.863 out of 5.0 across fluency, empathy, and support dimensions (vs. 2.834 for Qwen3-8B-nothink and 2.547 for Qwen3-8B-think). In human preference evaluation, MindChat-R0 also outperforms strong baselines with a win rate of 71.14%, based on pairwise comparisons by human annotators.
KW - chain-of-thought reasoning
KW - emotional support conversation
KW - empathetic dialogue systems
KW - large language model
KW - mental health
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105027027904
U2 - 10.1145/3714394.3756244
DO - 10.1145/3714394.3756244
M3 - 会议稿件
AN - SCOPUS:105027027904
T3 - UbiComp Companion 2025 - Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing
SP - 1209
EP - 1216
BT - UbiComp Companion 2025 - Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing
A2 - Beigl, Michael
A2 - Jacucci, Giulio
A2 - Sigg, Stephan
A2 - Xiao, Yu
A2 - Bardram, Jakob E.
A2 - Tsiropoulou, Eirini Eleni
A2 - Xu, Chenren
PB - Association for Computing Machinery, Inc
Y2 - 12 October 2025 through 16 October 2025
ER -