ChatPaper.aiChatPaper

FAM扩散:稳定扩散下的高分辨率图像生成中的频率和注意力调制

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

November 27, 2024
作者: Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martinez
cs.AI

摘要

扩散模型擅长生成高质量图像。然而,它们只在训练时使用的分辨率下运行时才有效。在缩放分辨率下进行推断会导致重复模式和结构失真。在更高分辨率下重新训练很快变得不可行。因此,使现有扩散模型能够在灵活的测试时分辨率下运行的方法是非常可取的。先前的研究存在频繁的伪影,并且通常引入大量的延迟开销。我们提出了两个简单的模块来解决这些问题。我们引入了一个利用傅立叶域改善全局结构一致性的频率调制(FM)模块,以及一个改善局部纹理模式一致性的注意力调制(AM)模块,这在先前的研究中很大程度上被忽略。我们的方法,命名为Fam扩散,可以无缝集成到任何潜在扩散模型中,并且无需额外训练。大量的定性结果突显了我们的方法在解决结构和局部伪影方面的有效性,而定量结果显示出最先进的性能。此外,我们的方法避免了为了改善一致性而采用的冗余推断技巧,如基于补丁或渐进式生成,从而导致可忽略的延迟开销。
English
Diffusion models are proficient at generating high-quality images. They are however effective only when operating at the resolution used during training. Inference at a scaled resolution leads to repetitive patterns and structural distortions. Retraining at higher resolutions quickly becomes prohibitive. Thus, methods enabling pre-existing diffusion models to operate at flexible test-time resolutions are highly desirable. Previous works suffer from frequent artifacts and often introduce large latency overheads. We propose two simple modules that combine to solve these issues. We introduce a Frequency Modulation (FM) module that leverages the Fourier domain to improve the global structure consistency, and an Attention Modulation (AM) module which improves the consistency of local texture patterns, a problem largely ignored in prior works. Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training. Extensive qualitative results highlight the effectiveness of our method in addressing structural and local artifacts, while quantitative results show state-of-the-art performance. Also, our method avoids redundant inference tricks for improved consistency such as patch-based or progressive generation, leading to negligible latency overheads.

Summary

AI-Generated Summary

PDF182December 2, 2024