人間のラベリングなしで画像の安全性を判定するためのMLLM

要旨

画像コンテンツの安全性は、オンラインプラットフォーム上での視覚メディアの台頭とともに重要な課題となっています。一方、AIによる生成コンテンツ（AIGC）の時代において、多くの画像生成モデルが性的または暴力的なコンテンツを含む有害なコンテンツを生成する能力を持っています。したがって、確立された安全ルールに基づいてこのような危険な画像を特定することが重要となります。事前学習された多モーダル大規模言語モデル（MLLMs）は、その強力なパターン認識能力から、この点で潜在的な可能性を提供しています。既存のアプローチでは、通常、人間によってラベル付けされたデータセットでMLLMsを微調整しますが、これにはいくつかの欠点があります。まず、複雑で詳細なガイドラインに従ってデータにラベルを付けるために人間の注釈者に依存することは、コストがかかり、労力がかかります。さらに、安全判断システムのユーザーは安全ルールを頻繁に更新する必要があるため、人間ベースの注釈による微調整がより困難になります。このことから、研究課題が提起されます：事前定義された安全憲法（安全ルールのセット）を用いて、ゼロショット設定でMLLMsにクエリを送信することで、危険な画像を検出することは可能でしょうか？私たちの研究では、単に事前学習されたMLLMsにクエリを送信するだけでは満足のいく結果が得られないことが示されました。この効果の不足は、安全ルールの主観性、長大な憲法の複雑さ、モデルの固有のバイアスなどの要因に起因しています。これらの課題に対処するために、私たちはMLLMに基づく手法を提案しています。この手法は、安全ルールを客観化し、ルールと画像との関連性を評価し、デバイアスされたトークン確率に基づいて論理的かつ簡素化された事前条件チェーンを使用して迅速な判断を行い、必要に応じて連鎖的な思考プロセスでより深い推論を行います。実験結果は、私たちの手法がゼロショット画像安全性判断タスクに非常に効果的であることを示しています。

English

Image content safety has become a significant challenge with the rise of visual media on online platforms. Meanwhile, in the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content, such as images containing sexual or violent material. Thus, it becomes crucial to identify such unsafe images based on established safety rules. Pre-trained Multimodal Large Language Models (MLLMs) offer potential in this regard, given their strong pattern recognition abilities. Existing approaches typically fine-tune MLLMs with human-labeled datasets, which however brings a series of drawbacks. First, relying on human annotators to label data following intricate and detailed guidelines is both expensive and labor-intensive. Furthermore, users of safety judgment systems may need to frequently update safety rules, making fine-tuning on human-based annotation more challenging. This raises the research question: Can we detect unsafe images by querying MLLMs in a zero-shot setting using a predefined safety constitution (a set of safety rules)? Our research showed that simply querying pre-trained MLLMs does not yield satisfactory results. This lack of effectiveness stems from factors such as the subjectivity of safety rules, the complexity of lengthy constitutions, and the inherent biases in the models. To address these challenges, we propose a MLLM-based method includes objectifying safety rules, assessing the relevance between rules and images, making quick judgments based on debiased token probabilities with logically complete yet simplified precondition chains for safety rules, and conducting more in-depth reasoning with cascaded chain-of-thought processes if necessary. Experiment results demonstrate that our method is highly effective for zero-shot image safety judgment tasks.

人間のラベリングなしで画像の安全性を判定するためのMLLM

MLLM-as-a-Judge for Image Safety without Human Labeling

要旨

Support