偏好学习释放大语言模型的心理咨询潜能

摘要

将大型语言模型（LLMs）应用于心理辅导辅助是一项新兴且意义深远的方法，这一趋势源于患者需求与心理健康支持资源之间的显著差距。然而，当前的LLMs在持续有效地回应用户话语方面仍面临挑战，主要原因在于缺乏高质量真实心理辅导数据的监督，这些数据因涉及用户隐私通常难以获取。此外，现有会话中治疗师的回应质量因其专业培训与经验差异而大相径庭，评估治疗师回应质量仍是一个待解决的难题。在本研究中，我们首先提出了一套专业且全面的原则，用以评估治疗师对用户话语的回应。基于这些原则，我们构建了一个偏好数据集——PsychoCounsel-Preference，其中包含36,000个高质量偏好对比对，该数据集与专业心理治疗师的偏好保持一致，为评估和提升LLMs在心理辅导中的表现提供了坚实基础。通过奖励建模和偏好学习的实验验证，PsychoCounsel-Preference是LLMs获取辅导会话中回应用户所需关键技能的优质资源。我们最佳对齐的模型PsychoCounsel-Llama3-8B，在与GPT-4o的对比中取得了87%的胜率。我们公开发布了PsychoCounsel-Preference、PsychoCounsel-Llama3-8B及奖励模型PsychoCounsel Llama3-8B-Reward，以促进LLMs在心理辅导领域的研究，访问地址为：https://hf.co/Psychotherapy-LLM。

English

Applying large language models (LLMs) to assist in psycho-counseling is an emerging and meaningful approach, driven by the significant gap between patient needs and the availability of mental health support. However, current LLMs struggle to consistently provide effective responses to client speeches, largely due to the lack of supervision from high-quality real psycho-counseling data, whose content is typically inaccessible due to client privacy concerns. Furthermore, the quality of therapists' responses in available sessions can vary significantly based on their professional training and experience. Assessing the quality of therapists' responses remains an open challenge. In this work, we address these challenges by first proposing a set of professional and comprehensive principles to evaluate therapists' responses to client speeches. Using these principles, we create a preference dataset, PsychoCounsel-Preference, which contains 36k high-quality preference comparison pairs. This dataset aligns with the preferences of professional psychotherapists, providing a robust foundation for evaluating and improving LLMs in psycho-counseling. Experiments on reward modeling and preference learning demonstrate that PsychoCounsel-Preference is an excellent resource for LLMs to acquire essential skills for responding to clients in a counseling session. Our best-aligned model, PsychoCounsel-Llama3-8B, achieves an impressive win rate of 87% against GPT-4o. We release PsychoCounsel-Preference, PsychoCounsel-Llama3-8B and the reward model PsychoCounsel Llama3-8B-Reward to facilitate the research of psycho-counseling with LLMs at: https://hf.co/Psychotherapy-LLM.

偏好学习释放大语言模型的心理咨询潜能

Preference Learning Unlocks LLMs' Psycho-Counseling Skills

摘要

Summary

Support

Support