ChatPaper.aiChatPaper

从反思到完善:通过反思调优扩展文本到图像扩散模型的推理时优化

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

April 22, 2025
作者: Le Zhuo, Liangbing Zhao, Sayak Paul, Yue Liao, Renrui Zhang, Yi Xin, Peng Gao, Mohamed Elhoseiny, Hongsheng Li
cs.AI

摘要

近期,文本到图像的扩散模型通过大规模扩展训练数据和模型参数,在视觉质量上取得了令人瞩目的成就,然而在处理复杂场景和精细细节时仍显不足。受大型语言模型中涌现的自我反思能力启发,我们提出了ReflectionFlow,一种推理时框架,使扩散模型能够迭代反思并优化其输出。ReflectionFlow引入了三个互补的推理时扩展维度:(1)噪声级别扩展,以优化潜在初始化;(2)提示级别扩展,实现精确的语义引导;以及最为关键的(3)反思级别扩展,它明确提供可操作的反思,以迭代评估并修正先前的生成结果。为支持反思级别扩展,我们构建了GenRef,一个包含100万组三元组的大规模数据集,每组包含一条反思、一张有缺陷的图像及一张优化后的图像。利用此数据集,我们在顶尖的扩散变换器FLUX.1-dev上高效执行反思调优,通过统一框架内联合建模多模态输入。实验结果表明,ReflectionFlow显著优于单纯的噪声级别扩展方法,为在挑战性任务上实现更高质量的图像合成提供了可扩展且计算高效的解决方案。
English
Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often struggle with complex scenes and fine-grained details. Inspired by the self-reflection capabilities emergent in large language models, we propose ReflectionFlow, an inference-time framework enabling diffusion models to iteratively reflect upon and refine their outputs. ReflectionFlow introduces three complementary inference-time scaling axes: (1) noise-level scaling to optimize latent initialization; (2) prompt-level scaling for precise semantic guidance; and most notably, (3) reflection-level scaling, which explicitly provides actionable reflections to iteratively assess and correct previous generations. To facilitate reflection-level scaling, we construct GenRef, a large-scale dataset comprising 1 million triplets, each containing a reflection, a flawed image, and an enhanced image. Leveraging this dataset, we efficiently perform reflection tuning on state-of-the-art diffusion transformer, FLUX.1-dev, by jointly modeling multimodal inputs within a unified framework. Experimental results show that ReflectionFlow significantly outperforms naive noise-level scaling methods, offering a scalable and compute-efficient solution toward higher-quality image synthesis on challenging tasks.

Summary

AI-Generated Summary

PDF92April 23, 2025