弱到强扩散与反射
Weak-to-Strong Diffusion with Reflection
February 1, 2025
作者: Lichen Bai, Masashi Sugiyama, Zeke Xie
cs.AI
摘要
扩散生成模型的目标是通过梯度分数匹配将学习到的分布与真实数据分布对齐。然而,在训练数据质量、建模策略和架构设计方面的固有限制导致生成输出与真实数据之间存在不可避免的差距。为了减小这一差距,我们提出了弱到强扩散(Weak-to-Strong Diffusion,W2SD)的新框架,该框架利用现有弱模型和强模型之间的估计差异(即弱到强差异)来近似理想模型与强模型之间的差距。通过采用交替进行去噪和反演的反射操作,我们从理论上理解到,W2SD将潜变量沿着采样轨迹引导至真实数据分布的区域。W2SD具有高度灵活性和广泛适用性,通过策略性地选择弱到强模型对(例如,DreamShaper vs. SD1.5,MoE 中的优秀专家 vs. 糟糕专家),可以实现多样化的改进。大量实验证明,W2SD显著提高了人类偏好、美学质量和提示遵从性,在各种形式(例如图像、视频)、架构(例如基于 UNet、DiT、MoE)和基准测试中实现了 SOTA 性能。例如,搭配 W2SD 的 Juggernaut-XL 可以将 HPSv2 获胜率提高至原始结果的 90%。此外,W2SD 实现的性能增益明显超过了额外的计算开销,而来自不同弱到强差异的累积改进进一步巩固了其实际效用和可部署性。
English
The goal of diffusion generative models is to align the learned distribution
with the real data distribution through gradient score matching. However,
inherent limitations in training data quality, modeling strategies, and
architectural design lead to inevitable gap between generated outputs and real
data. To reduce this gap, we propose Weak-to-Strong Diffusion (W2SD), a novel
framework that utilizes the estimated difference between existing weak and
strong models (i.e., weak-to-strong difference) to approximate the gap between
an ideal model and a strong model. By employing a reflective operation that
alternates between denoising and inversion with weak-to-strong difference, we
theoretically understand that W2SD steers latent variables along sampling
trajectories toward regions of the real data distribution. W2SD is highly
flexible and broadly applicable, enabling diverse improvements through the
strategic selection of weak-to-strong model pairs (e.g., DreamShaper vs. SD1.5,
good experts vs. bad experts in MoE). Extensive experiments demonstrate that
W2SD significantly improves human preference, aesthetic quality, and prompt
adherence, achieving SOTA performance across various modalities (e.g., image,
video), architectures (e.g., UNet-based, DiT-based, MoE), and benchmarks. For
example, Juggernaut-XL with W2SD can improve with the HPSv2 winning rate up to
90% over the original results. Moreover, the performance gains achieved by W2SD
markedly outweigh its additional computational overhead, while the cumulative
improvements from different weak-to-strong difference further solidify its
practical utility and deployability.Summary
AI-Generated Summary