高斯混合流匹配模型
Gaussian Mixture Flow Matching Models
April 7, 2025
作者: Hansheng Chen, Kai Zhang, Hao Tan, Zexiang Xu, Fujun Luan, Leonidas Guibas, Gordon Wetzstein, Sai Bi
cs.AI
摘要
扩散模型将去噪分布近似为高斯分布并预测其均值,而流匹配模型则将高斯均值重新参数化为流速度。然而,由于离散化误差,它们在少步采样中表现欠佳,并且在无分类器指导(CFG)下往往产生过度饱和的颜色。为应对这些局限,我们提出了一种新颖的高斯混合流匹配(GMFlow)模型:GMFlow不预测均值,而是预测动态高斯混合(GM)参数,以捕捉多模态流速度分布,该分布可通过KL散度损失进行学习。我们证明,GMFlow推广了先前的扩散和流匹配模型,后者使用L_2去噪损失学习单一高斯分布。在推理阶段,我们推导了GM-SDE/ODE求解器,利用解析去噪分布和速度场实现精确的少步采样。此外,我们引入了一种新颖的概率指导方案,缓解了CFG的过饱和问题,提升了图像生成质量。大量实验表明,GMFlow在生成质量上持续超越流匹配基线,在ImageNet 256×256数据集上仅用6个采样步骤即达到了0.942的精度。
English
Diffusion models approximate the denoising distribution as a Gaussian and
predict its mean, whereas flow matching models reparameterize the Gaussian mean
as flow velocity. However, they underperform in few-step sampling due to
discretization error and tend to produce over-saturated colors under
classifier-free guidance (CFG). To address these limitations, we propose a
novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the
mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a
multi-modal flow velocity distribution, which can be learned with a KL
divergence loss. We demonstrate that GMFlow generalizes previous diffusion and
flow matching models where a single Gaussian is learned with an L_2 denoising
loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic
denoising distributions and velocity fields for precise few-step sampling.
Furthermore, we introduce a novel probabilistic guidance scheme that mitigates
the over-saturation issues of CFG and improves image generation quality.
Extensive experiments demonstrate that GMFlow consistently outperforms flow
matching baselines in generation quality, achieving a Precision of 0.942 with
only 6 sampling steps on ImageNet 256times256.Summary
AI-Generated Summary