消除擴散模型中高引導比例的過飽和和人工痕跡
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
October 3, 2024
作者: Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber
cs.AI
摘要
在擴散模型中,無分類器引導(CFG)對於提高生成品質和輸入條件與最終輸出之間的對齊至關重要。儘管通常需要高引導比例來增強這些方面,但這也會導致過飽和和不現實的人工痕跡。在本文中,我們重新審視了CFG更新規則並引入修改以解決此問題。我們首先將CFG中的更新項分解為與條件模型預測平行和正交組件,並觀察到平行組件主要導致過飽和,而正交組件則增強了圖像品質。因此,我們提出降低平行組件的權重以實現高質量生成而不過飽和。此外,我們將CFG與梯度上升之間建立聯繫,並基於此洞察力引入了一種新的重新縮放和動量方法,用於CFG更新規則。我們的方法被稱為自適應投影引導(APG),保留了CFG的質量提升優勢,同時使得可以在不過飽和的情況下使用更高的引導比例。APG易於實施,並且在採樣過程中幾乎不會引入額外的計算開銷。通過大量實驗,我們證明APG與各種有條件的擴散模型和採樣器兼容,從而提高了FID、召回率和飽和度分數,同時保持了與CFG相當的精度,使我們的方法成為標準無分類器引導的優越即插即用替代方案。
English
Classifier-free guidance (CFG) is crucial for improving both generation
quality and alignment between the input condition and final output in diffusion
models. While a high guidance scale is generally required to enhance these
aspects, it also causes oversaturation and unrealistic artifacts. In this
paper, we revisit the CFG update rule and introduce modifications to address
this issue. We first decompose the update term in CFG into parallel and
orthogonal components with respect to the conditional model prediction and
observe that the parallel component primarily causes oversaturation, while the
orthogonal component enhances image quality. Accordingly, we propose
down-weighting the parallel component to achieve high-quality generations
without oversaturation. Additionally, we draw a connection between CFG and
gradient ascent and introduce a new rescaling and momentum method for the CFG
update rule based on this insight. Our approach, termed adaptive projected
guidance (APG), retains the quality-boosting advantages of CFG while enabling
the use of higher guidance scales without oversaturation. APG is easy to
implement and introduces practically no additional computational overhead to
the sampling process. Through extensive experiments, we demonstrate that APG is
compatible with various conditional diffusion models and samplers, leading to
improved FID, recall, and saturation scores while maintaining precision
comparable to CFG, making our method a superior plug-and-play alternative to
standard classifier-free guidance.Summary
AI-Generated Summary