LLMs在翻译中的失误:M-ALERT揭示了跨语言安全漏洞。
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
December 19, 2024
作者: Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting
cs.AI
摘要
在跨多种语言构建安全的大型语言模型(LLMs)对于确保安全访问和语言多样性至关重要。为此,我们引入了M-ALERT,这是一个多语言基准,评估五种语言(英语、法语、德语、意大利语和西班牙语)中LLMs的安全性。M-ALERT包括每种语言15k个高质量提示,总计75k个,遵循详细的ALERT分类法。我们对10个最先进的LLMs进行了广泛实验,突显了语言特定安全性分析的重要性,揭示了模型在不同语言和类别中往往存在显著的安全性不一致性。例如,Llama3.2在意大利语的crime_tax类别中表现出高度的不安全性,但在其他语言中保持安全。类似的差异可以在所有模型中观察到。相反,某些类别,如substance_cannabis和crime_propaganda,在所有模型和语言中一致地触发不安全的响应。这些发现强调了在LLMs中需要强大的多语言安全实践,以确保在不同用户群体中的安全和负责任的使用。
English
Building safe Large Language Models (LLMs) across multiple languages is
essential in ensuring both safe access and linguistic diversity. To this end,
we introduce M-ALERT, a multilingual benchmark that evaluates the safety of
LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT
includes 15k high-quality prompts per language, totaling 75k, following the
detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs
highlight the importance of language-specific safety analysis, revealing that
models often exhibit significant inconsistencies in safety across languages and
categories. For instance, Llama3.2 shows high unsafety in the category
crime_tax for Italian but remains safe in other languages. Similar differences
can be observed across all models. In contrast, certain categories, such as
substance_cannabis and crime_propaganda, consistently trigger unsafe responses
across models and languages. These findings underscore the need for robust
multilingual safety practices in LLMs to ensure safe and responsible usage
across diverse user communities.Summary
AI-Generated Summary