MindChat-R0: A Large Language Model for Emotionally Supportive Dialogue through Reinforcement Learning

  • Dong She*
  • , Chenxu Zhang
  • , Xianrong Yao
  • , Yang Gao
  • , Zhanpeng Jin*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Emotional Support Conversation (ESC) systems are critical for assisting individuals facing mental health challenges. In this work, we present a reinforcement learning framework to improve ESC systems through structured emotional reasoning. We first collect and clean a dataset of 4,500 real-world support-seeking posts. To guide emotional generation, we introduce Empathetic Chain-of-Thought(ECoT), a structured reasoning format that encourages multi-turn empathy and coherence. Based on this, we train MindChat-R0 (Qwen3-8B as basic model), a Chinese empathetic dialogue agent, using reinforcement learning optimized by ECoT-driven reward signals. LLM-as-a-judge evaluation shows that MindChat achieves the highest average score of 3.863 out of 5.0 across fluency, empathy, and support dimensions (vs. 2.834 for Qwen3-8B-nothink and 2.547 for Qwen3-8B-think). In human preference evaluation, MindChat-R0 also outperforms strong baselines with a win rate of 71.14%, based on pairwise comparisons by human annotators.

Original languageEnglish
Title of host publicationUbiComp Companion 2025 - Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing
EditorsMichael Beigl, Giulio Jacucci, Stephan Sigg, Yu Xiao, Jakob E. Bardram, Eirini Eleni Tsiropoulou, Chenren Xu
PublisherAssociation for Computing Machinery, Inc
Pages1209-1216
Number of pages8
ISBN (Electronic)9798400714771
DOIs
StatePublished - 29 Dec 2025
Externally publishedYes
Event2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp Companion 2025 - Espoo, Finland
Duration: 12 Oct 202516 Oct 2025

Publication series

NameUbiComp Companion 2025 - Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing

Conference

Conference2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp Companion 2025
Country/TerritoryFinland
CityEspoo
Period12/10/2516/10/25

Keywords

  • chain-of-thought reasoning
  • emotional support conversation
  • empathetic dialogue systems
  • large language model
  • mental health
  • reinforcement learning

Fingerprint

Dive into the research topics of 'MindChat-R0: A Large Language Model for Emotionally Supportive Dialogue through Reinforcement Learning'. Together they form a unique fingerprint.

Cite this