VLSBench: 다중 모달 안전에서 시각적 누설 드러내기

초록

다중 모달 대형 언어 모델(Multimodal large language models, MLLMs)의 안전 문제는 다양한 응용 분야에서 점차 중요한 문제가 되었습니다. 놀랍게도, 이전 연구들은 텍스트 기반 언러닝을 사용하여 MLLMs를 정렬하면 이미지-텍스트 쌍으로 훈련된 MLLMs와 유사한 안전 성능을 달성한다는 반직관적인 현상을 보여줍니다. 이러한 반직관적인 현상을 설명하기 위해 우리는 기존의 다중 모달 안전 벤치마크에서 시각적 안전 정보 누출(Visual Safety Information Leakage, VSIL) 문제를 발견했습니다. 즉, 이미지의 잠재적으로 위험하고 민감한 콘텐츠가 텍스트 쿼리에서 드러나는 것입니다. 이러한 방식으로 MLLMs는 텍스트 쿼리에 따라 이러한 민감한 텍스트-이미지 쿼리를 쉽게 거부할 수 있습니다. 그러나 VSIL이 없는 이미지-텍스트 쌍은 현실 세계 시나리오에서 흔하며 기존의 다중 모달 안전 벤치마크에서 간과되고 있습니다. 이에 우리는 2.4k 개의 이미지-텍스트 쌍을 사용하여 이미지에서 텍스트 쿼리로의 시각적 안전 누출을 방지하는 다중 모달 시각적 누출 없는 안전 벤치마크(VLSBench)를 구축했습니다. 실험 결과는 VLSBench가 LLaVA, Qwen2-VL, Llama3.2-Vision 및 GPT-4o를 포함한 오픈 소스 및 클로즈 소스 MLLMs에 상당한 도전을 제기한다는 것을 보여줍니다. 이 연구는 VSIL이 있는 다중 모달 안전 시나리오에 대해 텍스트 정렬이 충분하다는 것을 보여주며, VSIL이 없는 다중 모달 안전 시나리오에 대해서는 다중 모달 정렬이 더 유망한 해결책임을 보여줍니다. 코드 및 데이터는 다음에서 확인할 수 있습니다: http://hxhcreate.github.io/VLSBench

English

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL. Please see our code and data at: http://hxhcreate.github.io/VLSBench

VLSBench: 다중 모달 안전에서 시각적 누설 드러내기

VLSBench: Unveiling Visual Leakage in Multimodal Safety

초록

Summary

Support