Evaluating Psychological Competency via Chinese Q&A in Large Language Models

  • Feng Gao
  • , Yishen He
  • , Qin Chen*
  • , Feng Liu*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, the application of large language models (LLMs) in psychology has gained increasing attention. However, their psychological competence still requires further investigation. This study explores this issue through the lens of Chinese psychological knowledge question answering (QA). Specifically, we constructed a dedicated dataset based on Chinese qualification examinations for psychological counselors and psychotherapists. Subsequently, we evaluated dense, Mixture-of-Expert, and reasoning LLMs with varying parameter sizes and evaluation modes in the Chinese context, measuring answer accuracy in both closed-ended and open-ended settings. The experimental results showed that the larger and more recent LLMs achieved higher accuracy in psychological QA. While few-shot learning led to improvements in accuracy, Chain-of-Thought prompting and reasoning LLMs provided only limited gains. Notably, LLMs achieved higher accuracy in closed-ended settings than in open-ended ones. Furthermore, error analysis indicated that LLMs can produce incorrect or hallucinated responses, primarily due to insufficient psychological knowledge and conceptual confusion. Although current LLMs show promise in psychological QA tasks, users should remain cautious about over-reliance on their responses. A complementary, human-AI collaborative approach is recommended for practical use.

Original languageEnglish
Article number9089
JournalApplied Sciences (Switzerland)
Volume15
Issue number16
DOIs
StatePublished - Aug 2025

Keywords

  • LLM evaluation
  • large language models
  • psychological question answering

Fingerprint

Dive into the research topics of 'Evaluating Psychological Competency via Chinese Q&A in Large Language Models'. Together they form a unique fingerprint.

Cite this