VLSBench:揭示多模式安全中的視覺洩漏
VLSBench: Unveiling Visual Leakage in Multimodal Safety
November 29, 2024
作者: Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao
cs.AI
摘要
多模式大型語言模型(MLLMs)的安全性問題逐漸成為各種應用中的一個重要問題。令人驚訝的是,先前的研究表明一個反直覺的現象,即使用文本去對齊MLLMs實現了與使用圖像-文本對訓練的MLLMs相當的安全性表現。為了解釋這種反直覺的現象,我們發現現有多模式安全基準中存在一個視覺安全信息洩漏(VSIL)問題,即圖像中的潛在風險和敏感內容已在文本查詢中被揭示。這樣,MLLMs可以根據文本查詢輕易拒絕這些敏感的文本-圖像查詢。然而,在現實情況中,沒有VSIL的圖像-文本對是常見的,但被現有的多模式安全基準所忽略。因此,我們構建了多模式視覺無洩漏安全基準(VLSBench),防止從圖像到文本查詢的視覺安全洩漏,包含2.4k個圖像-文本對。實驗結果表明,VLSBench對於包括LLaVA、Qwen2-VL、Llama3.2-Vision和GPT-4o在內的開源和封閉源MLLMs都構成了重大挑戰。本研究表明,對於存在VSIL的多模式安全場景,文本對齊已足夠,而對於不存在VSIL的多模式安全場景,多模式對齊是一個更有前途的解決方案。請查看我們的代碼和數據:http://hxhcreate.github.io/VLSBench
English
Safety concerns of Multimodal large language models (MLLMs) have gradually
become an important problem in various applications. Surprisingly, previous
works indicate a counter-intuitive phenomenon that using textual unlearning to
align MLLMs achieves comparable safety performances with MLLMs trained with
image-text pairs. To explain such a counter-intuitive phenomenon, we discover a
visual safety information leakage (VSIL) problem in existing multimodal safety
benchmarks, i.e., the potentially risky and sensitive content in the image has
been revealed in the textual query. In this way, MLLMs can easily refuse these
sensitive text-image queries according to textual queries. However, image-text
pairs without VSIL are common in real-world scenarios and are overlooked by
existing multimodal safety benchmarks. To this end, we construct multimodal
visual leakless safety benchmark (VLSBench) preventing visual safety leakage
from image to textual query with 2.4k image-text pairs. Experimental results
indicate that VLSBench poses a significant challenge to both open-source and
close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o.
This study demonstrates that textual alignment is enough for multimodal safety
scenarios with VSIL, while multimodal alignment is a more promising solution
for multimodal safety scenarios without VSIL. Please see our code and data at:
http://hxhcreate.github.io/VLSBenchSummary
AI-Generated Summary