ChatPaper.aiChatPaper

超越“拒绝”:量化AI过度拒绝与情感依恋的边界

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries

February 20, 2025
作者: David Noever, Grant Rosario
cs.AI

摘要

我们提出了一种开源基准和评估框架,用于评估大型语言模型(LLMs)在处理情感边界方面的表现。通过使用包含六种语言、共计1156条提示的数据集,我们对三种领先的LLM(GPT-4o、Claude-3.5 Sonnet和Mistral-large)进行了评估,重点考察它们通过模式匹配响应分析来维持适当情感边界的能力。该框架量化了七种关键模式下的响应:直接拒绝、道歉、解释、转移话题、承认、设定边界和情感意识。结果显示,各模型在处理边界的方法上存在显著差异,其中Claude-3.5获得了最高综合得分(8.69/10),并生成了更长、更细致的响应(平均86.51词)。我们发现,英语(平均得分25.62)与非英语互动(<0.22)之间存在显著的性能差距,英语响应显示出明显更高的拒绝率(43.20%对比非英语的<1%)。模式分析揭示了模型特定的策略,如Mistral倾向于转移话题(4.2%),而所有模型在共情得分上均表现较低(<0.06)。本研究的局限包括模式匹配可能导致的过度简化、响应分析中缺乏上下文理解,以及对复杂情感响应的二元分类。未来工作应探索更细致的评分方法,扩大语言覆盖范围,并研究文化差异对情感边界期望的影响。我们的基准和方法为系统评估LLM的情商和边界设定能力提供了基础。
English
We present an open-source benchmark and evaluation framework for assessing emotional boundary handling in Large Language Models (LLMs). Using a dataset of 1156 prompts across six languages, we evaluated three leading LLMs (GPT-4o, Claude-3.5 Sonnet, and Mistral-large) on their ability to maintain appropriate emotional boundaries through pattern-matched response analysis. Our framework quantifies responses across seven key patterns: direct refusal, apology, explanation, deflection, acknowledgment, boundary setting, and emotional awareness. Results demonstrate significant variation in boundary-handling approaches, with Claude-3.5 achieving the highest overall score (8.69/10) and producing longer, more nuanced responses (86.51 words on average). We identified a substantial performance gap between English (average score 25.62) and non-English interactions (< 0.22), with English responses showing markedly higher refusal rates (43.20% vs. < 1% for non-English). Pattern analysis revealed model-specific strategies, such as Mistral's preference for deflection (4.2%) and consistently low empathy scores across all models (< 0.06). Limitations include potential oversimplification through pattern matching, lack of contextual understanding in response analysis, and binary classification of complex emotional responses. Future work should explore more nuanced scoring methods, expand language coverage, and investigate cultural variations in emotional boundary expectations. Our benchmark and methodology provide a foundation for systematic evaluation of LLM emotional intelligence and boundary-setting capabilities.

Summary

AI-Generated Summary

PDF03February 24, 2025