CBT-Bench: 대규모 언어 모델의 인지 행동 요법 지원 능력 평가

초록

오늘날 환자의 요구와 제공되는 정신 건강 지원 사이에는 상당한 격차가 있습니다. 본 논문에서는 대규모 언어 모델(LLMs)을 활용하여 전문가 심리 치료를 지원하는 잠재력을 철저히 조사하고자 합니다. 이를 위해 우리는 인지 행동 치료(CBT) 지원의 체계적 평가를 위한 새로운 벤치마크인 CBT-BENCH를 제안합니다. CBT-BENCH에는 다음과 같은 세 가지 수준의 작업이 포함되어 있습니다: I: 다지선다 문제를 풀어 기본 CBT 지식 습득, II: 인지 모델 이해를 위한 과제로서 인지 왜곡 분류, 주요 핵심 신념 분류, 세분화된 핵심 신념 분류가 있으며, III: 치료적 대응 생성을 위해 CBT 세션에서 환자 발화에 대한 응답 생성 작업이 있습니다. 이러한 작업들은 AI 지원을 통해 향상될 수 있는 CBT의 주요 측면을 포괄하며, 동시에 기본적인 지식 암기부터 실제 치료 대화에 참여하는 능력 요구의 등급 구조를 개요화합니다. 우리는 우리의 벤치마크에서 대표적인 LLMs를 평가했습니다. 실험 결과는 LLMs가 CBT 지식을 암기하는 데 능숙하나, 환자의 인지 구조를 심층적으로 분석하고 효과적인 응답을 생성하는 복잡한 실제 시나리오에서는 성능이 부족하며, 잠재적인 미래 작업을 시사합니다.

English

There is a significant gap between patient needs and available mental health support today. In this paper, we aim to thoroughly examine the potential of using Large Language Models (LLMs) to assist professional psychotherapy. To this end, we propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance. We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions. These tasks encompass key aspects of CBT that could potentially be enhanced through AI assistance, while also outlining a hierarchy of capability requirements, ranging from basic knowledge recitation to engaging in real therapeutic conversations. We evaluated representative LLMs on our benchmark. Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios requiring deep analysis of patients' cognitive structures and generating effective responses, suggesting potential future work.

CBT-Bench: 대규모 언어 모델의 인지 행동 요법 지원 능력 평가

CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy

초록

Summary

Support