CoRe^2：收集、反思与精炼，实现更优更快的生成

摘要

实现文本到图像（T2I）生成模型在采样速度与质量上的双重提升，是一个极具前景的研究方向。以往的研究往往侧重于在牺牲采样效率的前提下提升合成图像的视觉质量，或是大幅加速采样过程却未能增强基础模型的生成能力。此外，几乎所有推理方法都未能同时在扩散模型（DMs）和视觉自回归模型（ARMs）上保证稳定的性能表现。本文提出了一种新颖的即插即用推理范式——CoRe^2，它包含三个子过程：收集（Collect）、反映（Reflect）和精炼（Refine）。CoRe^2首先收集无分类器引导（CFG）轨迹，随后利用收集的数据训练一个弱模型，该模型反映易于学习的内容，同时将推理过程中的函数评估次数减半。接着，CoRe^2采用弱到强引导策略来精炼条件输出，从而提升模型生成高频和真实内容的能力，这些内容对于基础模型而言难以捕捉。据我们所知，CoRe^2是首个在包括SDXL、SD3.5和FLUX在内的多种DMs，以及如LlamaGen的ARMs上均展现出高效性与有效性的方法。它在HPD v2、Pick-of-Pic、Drawbench、GenEval和T2I-Compbench等基准测试中均表现出显著的性能提升。此外，CoRe^2能够无缝集成最先进的Z-Sampling技术，在PickScore和AES上分别超越其0.3和0.16分，同时在使用SD3.5时节省了5.64秒的时间。代码已发布于https://github.com/xie-lab-ml/CoRe/tree/main。

English

Making text-to-image (T2I) generative model sample both fast and well represents a promising research direction. Previous studies have typically focused on either enhancing the visual quality of synthesized images at the expense of sampling efficiency or dramatically accelerating sampling without improving the base model's generative capacity. Moreover, nearly all inference methods have not been able to ensure stable performance simultaneously on both diffusion models (DMs) and visual autoregressive models (ARMs). In this paper, we introduce a novel plug-and-play inference paradigm, CoRe^2, which comprises three subprocesses: Collect, Reflect, and Refine. CoRe^2 first collects classifier-free guidance (CFG) trajectories, and then use collected data to train a weak model that reflects the easy-to-learn contents while reducing number of function evaluations during inference by half. Subsequently, CoRe^2 employs weak-to-strong guidance to refine the conditional output, thereby improving the model's capacity to generate high-frequency and realistic content, which is difficult for the base model to capture. To the best of our knowledge, CoRe^2 is the first to demonstrate both efficiency and effectiveness across a wide range of DMs, including SDXL, SD3.5, and FLUX, as well as ARMs like LlamaGen. It has exhibited significant performance improvements on HPD v2, Pick-of-Pic, Drawbench, GenEval, and T2I-Compbench. Furthermore, CoRe^2 can be seamlessly integrated with the state-of-the-art Z-Sampling, outperforming it by 0.3 and 0.16 on PickScore and AES, while achieving 5.64s time saving using SD3.5.Code is released at https://github.com/xie-lab-ml/CoRe/tree/main.

CoRe^2：收集、反思与精炼，实现更优更快的生成

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster

摘要

Summary

Support