消除擴散模型中高引導比例的過飽和和人工痕跡

摘要

在擴散模型中，無分類器引導（CFG）對於提高生成品質和輸入條件與最終輸出之間的對齊至關重要。儘管通常需要高引導比例來增強這些方面，但這也會導致過飽和和不現實的人工痕跡。在本文中，我們重新審視了CFG更新規則並引入修改以解決此問題。我們首先將CFG中的更新項分解為與條件模型預測平行和正交組件，並觀察到平行組件主要導致過飽和，而正交組件則增強了圖像品質。因此，我們提出降低平行組件的權重以實現高質量生成而不過飽和。此外，我們將CFG與梯度上升之間建立聯繫，並基於此洞察力引入了一種新的重新縮放和動量方法，用於CFG更新規則。我們的方法被稱為自適應投影引導（APG），保留了CFG的質量提升優勢，同時使得可以在不過飽和的情況下使用更高的引導比例。APG易於實施，並且在採樣過程中幾乎不會引入額外的計算開銷。通過大量實驗，我們證明APG與各種有條件的擴散模型和採樣器兼容，從而提高了FID、召回率和飽和度分數，同時保持了與CFG相當的精度，使我們的方法成為標準無分類器引導的優越即插即用替代方案。

English

Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.

消除擴散模型中高引導比例的過飽和和人工痕跡

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

摘要

Summary

Support

Support