ChatPaper.aiChatPaper

可擴展的排名偏好優化在文本到圖像生成中的應用

Scalable Ranked Preference Optimization for Text-to-Image Generation

October 23, 2024
作者: Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata, Sergey Tulyakov, Jian Ren, Anil Kag
cs.AI

摘要

直接偏好優化(DPO)已成為一種強大的方法,用於將文本到圖像(T2I)模型與人類反饋進行對齊。不幸的是,成功應用DPO於T2I模型需要大量資源來收集和標記大規模數據集,例如,數百萬個由人類偏好標註的生成配對圖像。此外,由於T2I模型的快速改進導致圖像質量提高,這些人類偏好數據集可能會迅速過時。在這項工作中,我們研究了一種可擴展的方法,用於收集用於DPO訓練的大規模完全合成數據集。具體來說,配對圖像的偏好是使用預先訓練的獎勵函數生成的,消除了需要讓人類參與標註過程,極大地提高了數據集收集效率。此外,我們展示了這種數據集允許跨多個模型進行預測平均化,並收集排名偏好而非成對偏好。此外,我們引入了RankDPO來利用排名反饋增強基於DPO的方法。將RankDPO應用於SDXL和SD3-Medium模型,使用我們合成生成的偏好數據集“Syn-Pic”,提高了遵循提示(在T2I-Compbench、GenEval和DPG-Bench等基準測試中)和視覺質量(通過用戶研究)。這一流程提供了一種實用且可擴展的解決方案,用於開發更好的偏好數據集,以提高文本到圖像模型的性能。
English
Direct Preference Optimization (DPO) has emerged as a powerful approach to align text-to-image (T2I) models with human feedback. Unfortunately, successful application of DPO to T2I models requires a huge amount of resources to collect and label large-scale datasets, e.g., millions of generated paired images annotated with human preferences. In addition, these human preference datasets can get outdated quickly as the rapid improvements of T2I models lead to higher quality images. In this work, we investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training. Specifically, the preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process, greatly improving the dataset collection efficiency. Moreover, we demonstrate that such datasets allow averaging predictions across multiple models and collecting ranked preferences as opposed to pairwise preferences. Furthermore, we introduce RankDPO to enhance DPO-based methods using the ranking feedback. Applying RankDPO on SDXL and SD3-Medium models with our synthetically generated preference dataset ``Syn-Pic'' improves both prompt-following (on benchmarks like T2I-Compbench, GenEval, and DPG-Bench) and visual quality (through user studies). This pipeline presents a practical and scalable solution to develop better preference datasets to enhance the performance of text-to-image models.

Summary

AI-Generated Summary

PDF152November 16, 2024