번역에서 누락된 LLMs: M-ALERT가 다국어 간 안전 간극을 발견합니다.

초록

여러 언어에 걸쳐 안전한 대형 언어 모델(LLM)을 구축하는 것은 안전한 액세스와 언어 다양성을 보장하는 데 중요합니다. 이를 위해 영어, 프랑스어, 독일어, 이탈리아어, 스페인어 다섯 가지 언어에서 LLM의 안전성을 평가하는 다국어 벤치마크인 M-ALERT를 소개합니다. M-ALERT는 각 언어당 15,000개의 고품질 프롬프트를 포함하여 총 75,000개의 프롬프트로, 자세한 ALERT 분류법을 따릅니다. 10개의 최첨단 LLM에 대한 광범위한 실험은 언어별 안전성 분석의 중요성을 강조하며, 모델이 종종 언어 및 범주별로 안전성에서 상당한 불일치를 나타내는 것을 밝혀냅니다. 예를 들어 Llama3.2는 이탈리아어의 범주인 crime_tax에서 높은 불안전성을 보이지만 다른 언어에서는 안전합니다. 이와 유사한 차이점이 모든 모델에서 관찰될 수 있습니다. 반면 substance_cannabis 및 crime_propaganda와 같은 특정 범주는 모델과 언어를 가리지 않고 일관되게 불안전한 응답을 유발합니다. 이러한 발견은 다양한 사용자 커뮤니티 전반에 걸쳐 안전하고 책임감 있는 사용을 보장하기 위해 LLM에서 견고한 다국어 안전 관행의 필요성을 강조합니다.

English

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

번역에서 누락된 LLMs: M-ALERT가 다국어 간 안전 간극을 발견합니다.

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

초록

Support