ChatPaper.aiChatPaper

RealHarm:真實世界語言模型應用失敗案例集

RealHarm: A Collection of Real-World Language Model Application Failures

April 14, 2025
作者: Pierre Le Jeune, Jiaen Liu, Luca Rossi, Matteo Dora
cs.AI

摘要

在面向消費者的應用中部署語言模型引入了眾多風險。儘管現有研究針對此類應用的危害和危險採用了基於監管框架和理論分析的自上而下方法,但對現實世界故障模式的實證證據仍顯不足。在本研究中,我們引入了RealHarm,這是一個基於對公開報導事件進行系統性審查而構建的、與AI代理互動問題的註釋數據集。從部署者的角度分析危害、原因及危險,我們發現聲譽損害是主要的組織性危害,而錯誤信息則是最常見的危險類別。我們對最先進的防護措施和內容審核系統進行了實證評估,以探討這些系統是否能夠預防這些事件,結果揭示了AI應用保護方面存在顯著差距。
English
Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.

Summary

AI-Generated Summary

PDF103April 16, 2025