ChatPaper.aiChatPaper

RealHarm:真实世界语言模型应用失败案例集

RealHarm: A Collection of Real-World Language Model Application Failures

April 14, 2025
作者: Pierre Le Jeune, Jiaen Liu, Luca Rossi, Matteo Dora
cs.AI

摘要

面向消费者的语言模型应用部署带来了诸多风险。尽管现有研究基于监管框架和理论分析,采用自上而下的方法探讨了此类应用的危害与风险,但关于现实世界中故障模式的实证证据仍显不足。在本研究中,我们引入了RealHarm数据集,该数据集通过对公开报道事件进行系统审查,构建了与AI代理交互中出现问题的标注记录。从部署者的视角分析危害、原因及风险,我们发现声誉损害构成了最主要的组织性危害,而错误信息则是最常见的风险类别。我们实证评估了最先进的防护措施和内容审核系统,以探究这些系统是否能够预防此类事件的发生,结果揭示了AI应用保护方面存在显著差距。
English
Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.

Summary

AI-Generated Summary

PDF113April 16, 2025