FAM擴散:頻率和注意力調節用於具有穩定擴散的高分辨率圖像生成
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
November 27, 2024
作者: Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martinez
cs.AI
摘要
擴散模型擅長生成高質量影像。然而,它們僅在訓練時使用的解析度下運作時才有效。在經過縮放的解析度進行推論會導致重複模式和結構失真。在更高解析度下重新訓練很快變得不切實際。因此,使現有擴散模型能夠在彈性的測試時解析度下運作的方法非常令人渴望。先前的研究常常存在著頻繁的瑕疵,並且通常會引入大量的延遲開銷。我們提出了兩個簡單的模組,結合起來解決這些問題。我們引入了一個利用傅立葉域來改善全局結構一致性的頻率調製(FM)模組,以及一個改善局部紋理模式一致性的注意力調製(AM)模組,這在先前的研究中往往被忽略。我們的方法,被稱為Fam擴散,可以無縫集成到任何潛在的擴散模型中,並且無需額外的訓練。大量的定性結果突顯了我們的方法在解決結構和局部瑕疵方面的有效性,而定量結果顯示了最先進的性能。此外,我們的方法避免了為了提高一致性而使用的冗餘推論技巧,如基於塊或漸進生成,從而帶來可忽略的延遲開銷。
English
Diffusion models are proficient at generating high-quality images. They are
however effective only when operating at the resolution used during training.
Inference at a scaled resolution leads to repetitive patterns and structural
distortions. Retraining at higher resolutions quickly becomes prohibitive.
Thus, methods enabling pre-existing diffusion models to operate at flexible
test-time resolutions are highly desirable. Previous works suffer from frequent
artifacts and often introduce large latency overheads. We propose two simple
modules that combine to solve these issues. We introduce a Frequency Modulation
(FM) module that leverages the Fourier domain to improve the global structure
consistency, and an Attention Modulation (AM) module which improves the
consistency of local texture patterns, a problem largely ignored in prior
works. Our method, coined Fam diffusion, can seamlessly integrate into any
latent diffusion model and requires no additional training. Extensive
qualitative results highlight the effectiveness of our method in addressing
structural and local artifacts, while quantitative results show
state-of-the-art performance. Also, our method avoids redundant inference
tricks for improved consistency such as patch-based or progressive generation,
leading to negligible latency overheads.Summary
AI-Generated Summary