扩散模型的双标题偏好优化
Dual Caption Preference Optimization for Diffusion Models
February 9, 2025
作者: Amir Saeidi, Yiran Luo, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral
cs.AI
摘要
最近针对大型语言模型(LLMs)开发的人类偏好优化最新进展显示出在改进文本到图像扩散模型方面具有显著潜力。这些方法旨在学习首选样本的分布,同时区分它们与较不受欢迎的样本。然而,现有的偏好数据集通常在这些分布之间存在重叠,导致冲突分布。此外,我们发现输入提示对于较不受欢迎的图像包含了无关信息,限制了去噪网络准确预测偏好优化方法中的噪声的能力,这被称为无关提示问题。为了解决这些挑战,我们提出了双标题偏好优化(DCPO),这是一种利用两个不同标题来减轻无关提示的新方法。为了解决冲突分布问题,我们引入了Pick-Double Caption数据集,这是Pick-a-Pic v2的修改版本,为首选和较不受欢迎的图像提供单独的标题。我们进一步提出了三种不同的生成不同标题的策略:标题生成、扰动和混合方法。我们的实验表明,DCPO显著提高了图像质量和与提示的相关性,优于多个指标,包括Pickscore、HPSv2.1、GenEval、CLIPscore和ImageReward,在以SD 2.1为骨干的基础上进行了微调的Stable Diffusion(SD)2.1、SFT_Chosen、Diffusion-DPO和MaPO。
English
Recent advancements in human preference optimization, originally developed
for Large Language Models (LLMs), have shown significant potential in improving
text-to-image diffusion models. These methods aim to learn the distribution of
preferred samples while distinguishing them from less preferred ones. However,
existing preference datasets often exhibit overlap between these distributions,
leading to a conflict distribution. Additionally, we identified that input
prompts contain irrelevant information for less preferred images, limiting the
denoising network's ability to accurately predict noise in preference
optimization methods, known as the irrelevant prompt issue. To address these
challenges, we propose Dual Caption Preference Optimization (DCPO), a novel
approach that utilizes two distinct captions to mitigate irrelevant prompts. To
tackle conflict distribution, we introduce the Pick-Double Caption dataset, a
modified version of Pick-a-Pic v2 with separate captions for preferred and less
preferred images. We further propose three different strategies for generating
distinct captions: captioning, perturbation, and hybrid methods. Our
experiments show that DCPO significantly improves image quality and relevance
to prompts, outperforming Stable Diffusion (SD) 2.1, SFT_Chosen, Diffusion-DPO,
and MaPO across multiple metrics, including Pickscore, HPSv2.1, GenEval,
CLIPscore, and ImageReward, fine-tuned on SD 2.1 as the backbone.Summary
AI-Generated Summary