扩散-锐化:利用去噪轨迹锐化微调扩散模型。
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
February 17, 2025
作者: Ye Tian, Ling Yang, Xinchen Zhang, Yunhai Tong, Mengdi Wang, Bin Cui
cs.AI
摘要
我们提出了扩散锐化(Diffusion-Sharpening),这是一种通过优化采样轨迹来增强下游对齐的微调方法。现有基于强化学习的微调方法侧重于单个训练时间步,并忽略了轨迹级别的对齐,而最近的采样轨迹优化方法会导致显著的推断 NFE 成本。扩散锐化通过使用路径积分框架在训练过程中选择最佳轨迹,利用奖励反馈并分摊推断成本来克服这一问题。我们的方法展示了卓越的训练效率,收敛速度更快,并且在不需要额外 NFE 的情况下实现了最佳推断效率。大量实验证明,扩散锐化在文本对齐、组合能力和人类偏好等多个指标上优于基于强化学习的微调方法(例如扩散-DPO)和采样轨迹优化方法(例如推断缩放),为未来扩散模型微调提供了可扩展和高效的解决方案。源代码:https://github.com/Gen-Verse/Diffusion-Sharpening
English
We propose Diffusion-Sharpening, a fine-tuning approach that enhances
downstream alignment by optimizing sampling trajectories. Existing RL-based
fine-tuning methods focus on single training timesteps and neglect
trajectory-level alignment, while recent sampling trajectory optimization
methods incur significant inference NFE costs. Diffusion-Sharpening overcomes
this by using a path integral framework to select optimal trajectories during
training, leveraging reward feedback, and amortizing inference costs. Our
method demonstrates superior training efficiency with faster convergence, and
best inference efficiency without requiring additional NFEs. Extensive
experiments show that Diffusion-Sharpening outperforms RL-based fine-tuning
methods (e.g., Diffusion-DPO) and sampling trajectory optimization methods
(e.g., Inference Scaling) across diverse metrics including text alignment,
compositional capabilities, and human preferences, offering a scalable and
efficient solution for future diffusion model fine-tuning. Code:
https://github.com/Gen-Verse/Diffusion-SharpeningSummary
AI-Generated Summary