噪音值得擴散指導

摘要

擴散模型在生成高質量圖像方面表現出色。然而，目前的擴散模型在沒有輔助方法（如無分類器輔助指導）的情況下很難產生可靠的圖像。輔助方法是否真的必要？觀察到通過擴散反演獲得的噪音可以重建高質量圖像而無需輔助，我們專注於去噪流程的初始噪音。通過將高斯噪音映射到「無輔助噪音」，我們發現小的低幅度低頻成分顯著增強了去噪過程，消除了對輔助的需求，從而提高了推理吞吐量和內存效率。在此基礎上，我們提出了一種新方法 \ours，該方法用單一的初始噪音細化取代了輔助方法。這種經過精細調整的噪音使得在相同的擴散流程中無需輔助即可生成高質量圖像。我們的噪音精煉模型利用了高效的噪音空間學習，僅使用 50K 文本-圖像對即實現了快速收斂和優異性能。我們通過多樣的指標驗證了其有效性，並分析了精煉噪音如何消除對輔助的需求。請查看我們的項目頁面：https://cvlab-kaist.github.io/NoiseRefine/。

English

Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to `guidance-free noise', we uncover that small low-magnitude low-frequency components significantly enhance the denoising process, removing the need for guidance and thus improving both inference throughput and memory. Expanding on this, we propose \ours, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise can eliminate the need for guidance. See our project page: https://cvlab-kaist.github.io/NoiseRefine/.

噪音值得擴散指導

A Noise is Worth Diffusion Guidance

摘要

Support