无需引导的视觉生成
Visual Generation Without Guidance
January 26, 2025
作者: Huayu Chen, Kai Jiang, Kaiwen Zheng, Jianfei Chen, Hang Su, Jun Zhu
cs.AI
摘要
在各种视觉生成模型中,无分类器引导(CFG)已成为一种默认技术,但在采样过程中需要同时进行条件模型和无条件模型的推断。我们提出构建无引导采样的视觉模型。由此产生的算法,无引导训练(GFT),在将采样减少到单个模型的同时,与CFG的性能相匹配,将计算成本减半。与先前依赖预训练的CFG网络的蒸馏方法不同,GFT可以直接从头开始训练。GFT实现简单。它保留了与CFG相同的最大似然目标,主要区别在于条件模型的参数化。实现GFT只需要对现有代码库进行最少的修改,因为大多数设计选择和超参数直接继承自CFG。我们在五种不同的视觉模型上进行了大量实验,展示了GFT的有效性和多功能性。在扩散、自回归和掩蔽预测建模领域,GFT始终实现了与CFG基线相媲美甚至更低的FID分数,同时在无引导的情况下保持了类似的多样性-保真度权衡。代码将在https://github.com/thu-ml/GFT 上提供。
English
Classifier-Free Guidance (CFG) has been a default technique in various visual
generative models, yet it requires inference from both conditional and
unconditional models during sampling. We propose to build visual models that
are free from guided sampling. The resulting algorithm, Guidance-Free Training
(GFT), matches the performance of CFG while reducing sampling to a single
model, halving the computational cost. Unlike previous distillation-based
approaches that rely on pretrained CFG networks, GFT enables training directly
from scratch. GFT is simple to implement. It retains the same maximum
likelihood objective as CFG and differs mainly in the parameterization of
conditional models. Implementing GFT requires only minimal modifications to
existing codebases, as most design choices and hyperparameters are directly
inherited from CFG. Our extensive experiments across five distinct visual
models demonstrate the effectiveness and versatility of GFT. Across domains of
diffusion, autoregressive, and masked-prediction modeling, GFT consistently
achieves comparable or even lower FID scores, with similar diversity-fidelity
trade-offs compared with CFG baselines, all while being guidance-free. Code
will be available at https://github.com/thu-ml/GFT.Summary
AI-Generated Summary