ChatPaper.aiChatPaper

噪声值扩散指导

A Noise is Worth Diffusion Guidance

December 5, 2024
作者: Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha, Kyong Hwan Jin, Seungryong Kim
cs.AI

摘要

扩散模型在生成高质量图像方面表现出色。然而,当前的扩散模型在缺乏引导方法(如无分类器引导)的情况下很难生成可靠图像。引导方法是否真的必要呢?观察到通过扩散反演获得的噪声可以重建高质量图像而无需引导,我们将注意力集中在去噪流程的初始噪声上。通过将高斯噪声映射为“无引导噪声”,我们发现小幅低幅度低频分量显著增强了去噪过程,消除了对引导的需求,从而提高了推理吞吐量和内存利用率。在此基础上,我们提出了一种新方法,名为\ours,它用单一的初始噪声细化取代了引导方法。这种经过细化的噪声使得在同一扩散流程中无需引导即可生成高质量图像。我们的噪声细化模型利用高效的噪声空间学习,仅通过 5 万个文本-图像对就实现了快速收敛和强大性能。我们通过多种指标验证了其有效性,并分析了细化噪声如何消除对引导的需求。请查看我们的项目页面:https://cvlab-kaist.github.io/NoiseRefine/。
English
Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to `guidance-free noise', we uncover that small low-magnitude low-frequency components significantly enhance the denoising process, removing the need for guidance and thus improving both inference throughput and memory. Expanding on this, we propose \ours, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise can eliminate the need for guidance. See our project page: https://cvlab-kaist.github.io/NoiseRefine/.

Summary

AI-Generated Summary

PDF313December 6, 2024