利用适配器高效蒸馏无分类器引导

摘要

尽管无分类器引导（CFG）对于条件扩散模型至关重要，但它使每次推理步骤中的神经网络函数评估（NFEs）数量翻倍。为缓解这一效率问题，我们引入了适配器引导蒸馏（AGD），一种在单次前向传播中模拟CFG的新方法。AGD利用轻量级适配器近似CFG，有效将采样速度提升一倍，同时保持甚至提升样本质量。与以往调整整个模型的引导蒸馏方法不同，AGD保持基础模型冻结，仅训练少量额外参数（约2%），显著降低了蒸馏阶段的资源需求。此外，该方法保留了原始模型权重，并允许适配器与源自同一基础模型的其他检查点无缝结合。我们还通过训练基于CFG引导的轨迹而非标准扩散轨迹，解决了现有引导蒸馏方法中训练与推理的关键不匹配问题。大量实验表明，AGD在仅需一半NFEs的情况下，在多种架构上实现了与CFG相当或更优的FID。值得注意的是，我们的方法使得在单块24GB显存的消费级GPU上蒸馏大型模型（约26亿参数）成为可能，相比需要多块高端GPU的先前方法，更具普及性。我们将公开本方法的实现代码。

English

While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters (sim2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models (sim2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.

利用适配器高效蒸馏无分类器引导

Efficient Distillation of Classifier-Free Guidance using Adapters

摘要

Summary

Support

Support