一步扩散模型与f-散度分布匹配

摘要

从扩散模型中进行采样通常涉及一个缓慢的迭代过程，这限制了其在实际部署中的应用，特别是在交互式场景中。为了加速生成速度，近期的方法通过变分分数蒸馏将多步扩散模型提炼为单步学生生成器，使学生的样本分布与教师的分布相匹配。然而，这些方法采用反向Kullback-Leibler（KL）散度进行分布匹配，而该散度以模式寻求特性著称。本文中，我们提出了一种新颖的f-散度最小化框架——f-distill，将分布匹配方法推广至涵盖不同散度，每种散度在模式覆盖与训练方差之间提供不同的权衡。我们推导了教师与学生分布间f-散度的梯度，并展示其可表示为两者分数差异与由密度比决定的权重函数的乘积。当使用较少模式寻求的散度时，该权重函数自然强调教师分布中密度较高的样本。我们注意到，使用反向KL散度的流行变分分数蒸馏方法是本框架中的一个特例。实验表明，采用如正向KL散度和Jensen-Shannon散度等替代f-散度，在图像生成任务中超越了当前最佳的变分分数蒸馏方法。特别是，使用Jensen-Shannon散度时，f-distill在ImageNet64上实现了当前最先进的单步生成性能，并在MS-COCO上实现了零样本文本到图像生成。项目页面：https://research.nvidia.com/labs/genair/f-distill

English

Sampling from diffusion models involves a slow iterative process that hinders their practical deployment, especially for interactive applications. To accelerate generation speed, recent approaches distill a multi-step diffusion model into a single-step student generator via variational score distillation, which matches the distribution of samples generated by the student to the teacher's distribution. However, these approaches use the reverse Kullback-Leibler (KL) divergence for distribution matching which is known to be mode seeking. In this paper, we generalize the distribution matching approach using a novel f-divergence minimization framework, termed f-distill, that covers different divergences with different trade-offs in terms of mode coverage and training variance. We derive the gradient of the f-divergence between the teacher and student distributions and show that it is expressed as the product of their score differences and a weighting function determined by their density ratio. This weighting function naturally emphasizes samples with higher density in the teacher distribution, when using a less mode-seeking divergence. We observe that the popular variational score distillation approach using the reverse-KL divergence is a special case within our framework. Empirically, we demonstrate that alternative f-divergences, such as forward-KL and Jensen-Shannon divergences, outperform the current best variational score distillation methods across image generation tasks. In particular, when using Jensen-Shannon divergence, f-distill achieves current state-of-the-art one-step generation performance on ImageNet64 and zero-shot text-to-image generation on MS-COCO. Project page: https://research.nvidia.com/labs/genair/f-distill

一步扩散模型与f-散度分布匹配

One-step Diffusion Models with f-Divergence Distribution Matching

摘要

Summary

Support