MLLM作為圖像安全的評判器,無需人工標註
MLLM-as-a-Judge for Image Safety without Human Labeling
December 31, 2024
作者: Zhenting Wang, Shuming Hu, Shiyu Zhao, Xiaowen Lin, Felix Juefei-Xu, Zhuowei Li, Ligong Han, Harihar Subramanyam, Li Chen, Jianfa Chen, Nan Jiang, Lingjuan Lyu, Shiqing Ma, Dimitris N. Metaxas, Ankit Jain
cs.AI
摘要
隨著在線平台上視覺媒體的興起,圖像內容安全已成為一個重要挑戰。與此同時,在人工智慧生成內容(AIGC)時代,許多圖像生成模型能夠製造包含性或暴力材料的有害內容。因此,根據已建立的安全規則識別此類不安全圖像變得至關重要。預訓練的多模式大型語言模型(MLLMs)在這方面具有潛力,因為它們具有強大的模式識別能力。現有方法通常使用人工標記的數據集對MLLM進行微調,但這帶來了一系列缺點。首先,依賴人類標註者按照複雜和詳細的指南標記數據既昂貴又勞動密集。此外,安全判斷系統的用戶可能需要經常更新安全規則,這使得基於人類標註更具挑戰性。這引出了研究問題:我們能否在預定的安全憲章(一組安全規則)的零樣本設置中通過查詢MLLM來檢測不安全圖像?我們的研究表明,僅僅查詢預訓練的MLLM並不能產生令人滿意的結果。這種缺乏效果源於諸如安全規則的主觀性、冗長憲章的複雜性以及模型中固有的偏見等因素。為應對這些挑戰,我們提出了一種基於MLLM的方法,包括客觀化安全規則、評估規則與圖像之間的相關性、基於去偏調標記概率做出快速判斷、以及根據簡化的安全規則的邏輯完整但簡化的前提鏈進行更深入的推理,必要時進行級聯思維過程。實驗結果表明,我們的方法對於零樣本圖像安全判斷任務非常有效。
English
Image content safety has become a significant challenge with the rise of
visual media on online platforms. Meanwhile, in the age of AI-generated content
(AIGC), many image generation models are capable of producing harmful content,
such as images containing sexual or violent material. Thus, it becomes crucial
to identify such unsafe images based on established safety rules. Pre-trained
Multimodal Large Language Models (MLLMs) offer potential in this regard, given
their strong pattern recognition abilities. Existing approaches typically
fine-tune MLLMs with human-labeled datasets, which however brings a series of
drawbacks. First, relying on human annotators to label data following intricate
and detailed guidelines is both expensive and labor-intensive. Furthermore,
users of safety judgment systems may need to frequently update safety rules,
making fine-tuning on human-based annotation more challenging. This raises the
research question: Can we detect unsafe images by querying MLLMs in a zero-shot
setting using a predefined safety constitution (a set of safety rules)? Our
research showed that simply querying pre-trained MLLMs does not yield
satisfactory results. This lack of effectiveness stems from factors such as the
subjectivity of safety rules, the complexity of lengthy constitutions, and the
inherent biases in the models. To address these challenges, we propose a
MLLM-based method includes objectifying safety rules, assessing the relevance
between rules and images, making quick judgments based on debiased token
probabilities with logically complete yet simplified precondition chains for
safety rules, and conducting more in-depth reasoning with cascaded
chain-of-thought processes if necessary. Experiment results demonstrate that
our method is highly effective for zero-shot image safety judgment tasks.Summary
AI-Generated Summary