ChatPaper.aiChatPaper

安全标准是否人人相同?大型语言模型的用户特定安全性评估

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

February 20, 2025
作者: Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Kibum Kim, Chanyoung Park
cs.AI

摘要

随着大型语言模型(LLM)代理的广泛应用,其安全漏洞日益凸显。现有的广泛基准测试通过主要依赖通用标准来评估LLM的多个安全维度,却忽视了用户特定的安全标准。然而,LLM的安全标准可能因用户个人特征而异,而非在所有用户间保持一致。这引发了一个关键的研究问题:在考虑用户特定安全标准时,LLM代理能否安全行事?尽管这对于LLM的安全使用至关重要,但目前尚无基准数据集用于评估LLM在用户特定安全方面的表现。为填补这一空白,我们推出了U-SAFEBENCH,这是首个旨在评估LLM用户特定安全性的基准。我们对18个广泛使用的LLM进行了评估,发现当前LLM在考虑用户特定安全标准时未能安全行事,这一发现标志着该领域的新突破。针对这一漏洞,我们提出了一种基于思维链的简单补救措施,并展示了其在提升用户特定安全性方面的有效性。我们的基准测试及代码已公开于https://github.com/yeonjun-in/U-SafeBench。
English
As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at https://github.com/yeonjun-in/U-SafeBench.

Summary

AI-Generated Summary

PDF152February 24, 2025