FreSca：揭示扩散模型中的缩放空间

摘要

扩散模型在图像任务中展现出卓越的可控性，这主要得益于其噪声预测机制，该机制编码了任务特定信息，并通过无分类器引导实现了可调节的缩放。这种缩放机制隐含地定义了一个“缩放空间”，其在细粒度语义操控方面的潜力尚未得到充分探索。我们深入研究了这一空间，首先从基于反转的编辑入手，发现条件与非条件噪声预测之间的差异承载着关键的语义信息。我们的核心贡献源于对噪声预测的傅里叶分析，揭示了其低频与高频成分在扩散过程中呈现不同的演变规律。基于这一洞见，我们提出了FreSca方法，该方法在傅里叶域内独立地对不同频段应用引导缩放。FreSca显著增强了现有图像编辑方法的效果，且无需重新训练。令人振奋的是，其有效性还延伸至图像理解任务，如深度估计，在多个数据集上均取得了量化提升。

English

Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and classifier-free guidance enabling adjustable scaling. This scaling mechanism implicitly defines a ``scaling space'' whose potential for fine-grained semantic manipulation remains underexplored. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain. FreSca demonstrably enhances existing image editing methods without retraining. Excitingly, its effectiveness extends to image understanding tasks such as depth estimation, yielding quantitative gains across multiple datasets.

FreSca：揭示扩散模型中的缩放空间

FreSca: Unveiling the Scaling Space in Diffusion Models

摘要

Summary

Support

Support