YINYANG-ALIGN:基于多目标优化的DPO文本到图像对齐的对立目标基准测试和提议
YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment
February 5, 2025
作者: Amitava Das, Yaswanth Narsupalli, Gurpreet Singh, Vinija Jain, Vasu Sharma, Suranjana Trivedy, Aman Chadha, Amit Sheth
cs.AI
摘要
在文图一体(T2I)系统中,精确的对齐至关重要,以确保生成的视觉不仅准确地体现用户意图,而且符合严格的道德和美学标准。像谷歌双子座(Google Gemini)事件这样的事件,其中错位的输出引发了重大公众抵制,突显了强大对齐机制的关键性需求。相比之下,大型语言模型(LLMs)在对齐方面取得了显著成功。借鉴这些进展,研究人员渴望将类似的对齐技术,如直接偏好优化(DPO),应用于T2I系统,以增强图像生成的保真度和可靠性。
我们提出了YinYangAlign,一个先进的基准评估框架,系统地量化T2I系统的对齐保真度,解决了六个基本且固有矛盾的设计目标。每一对代表了图像生成中的基本张力,比如在遵循用户提示与创造性修改之间保持平衡,或在视觉连贯性旁边保持多样性。YinYangAlign包括详细的公理数据集,其中包括人类提示、对齐(选定)响应、错位(被拒绝)的AI生成输出,以及对基本矛盾的解释。
English
Precise alignment in Text-to-Image (T2I) systems is crucial to ensure that
generated visuals not only accurately encapsulate user intents but also conform
to stringent ethical and aesthetic benchmarks. Incidents like the Google Gemini
fiasco, where misaligned outputs triggered significant public backlash,
underscore the critical need for robust alignment mechanisms. In contrast,
Large Language Models (LLMs) have achieved notable success in alignment.
Building on these advancements, researchers are eager to apply similar
alignment techniques, such as Direct Preference Optimization (DPO), to T2I
systems to enhance image generation fidelity and reliability.
We present YinYangAlign, an advanced benchmarking framework that
systematically quantifies the alignment fidelity of T2I systems, addressing six
fundamental and inherently contradictory design objectives. Each pair
represents fundamental tensions in image generation, such as balancing
adherence to user prompts with creative modifications or maintaining diversity
alongside visual coherence. YinYangAlign includes detailed axiom datasets
featuring human prompts, aligned (chosen) responses, misaligned (rejected)
AI-generated outputs, and explanations of the underlying contradictions.Summary
AI-Generated Summary