SNOOPI:具備適當引導的超級增強一步驅動蒸餾
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
December 3, 2024
作者: Viet Nguyen, Anh Aengus Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran
cs.AI
摘要
最近的研究方法在將多步驟的文本到圖像擴散模型轉化為單步驟模型方面取得了令人鼓舞的成果。最先進的高效擴散技術,即SwiftBrushv2(SBv2),甚至在資源有限的情況下超越了教師模型的表現。然而,我們的研究揭示了由於在變分分數蒸餾(VSD)損失中使用固定的引導尺度,導致其在處理不同擴散模型骨幹時的不穩定性。現有單步驟擴散模型的另一個弱點是缺乏對負面提示引導的支持,在實際圖像生成中至關重要。本文提出了SNOOPI,這是一個新穎的框架,旨在增強一步驟擴散模型在訓練和推斷過程中的引導,以應對這些限制。首先,我們通過Proper Guidance-SwiftBrush(PG-SB)有效增強了訓練穩定性,該方法採用了一種無需分類器的隨機尺度引導方法。通過改變教師模型的引導尺度,我們擴展了它們的輸出分佈,從而產生更穩健的VSD損失,使SB能夠在各種骨幹上有效執行,同時保持競爭力。其次,我們提出了一種無需訓練的方法,稱為Negative-Away Steer Attention(NASA),通過交叉關注將負面提示整合到一步驟擴散模型中,以抑制生成圖像中的不需要元素。我們的實驗結果表明,我們提出的方法在各種指標上顯著改善了基準模型。值得注意的是,我們實現了31.08的HPSv2分數,為一步擴散模型設立了新的最先進基準。
English
Recent approaches have yielded promising results in distilling multi-step
text-to-image diffusion models into one-step ones. The state-of-the-art
efficient distillation technique, i.e., SwiftBrushv2 (SBv2), even surpasses the
teacher model's performance with limited resources. However, our study reveals
its instability when handling different diffusion model backbones due to using
a fixed guidance scale within the Variational Score Distillation (VSD) loss.
Another weakness of the existing one-step diffusion models is the missing
support for negative prompt guidance, which is crucial in practical image
generation. This paper presents SNOOPI, a novel framework designed to address
these limitations by enhancing the guidance in one-step diffusion models during
both training and inference. First, we effectively enhance training stability
through Proper Guidance-SwiftBrush (PG-SB), which employs a random-scale
classifier-free guidance approach. By varying the guidance scale of both
teacher models, we broaden their output distributions, resulting in a more
robust VSD loss that enables SB to perform effectively across diverse backbones
while maintaining competitive performance. Second, we propose a training-free
method called Negative-Away Steer Attention (NASA), which integrates negative
prompts into one-step diffusion models via cross-attention to suppress
undesired elements in generated images. Our experimental results show that our
proposed methods significantly improve baseline models across various metrics.
Remarkably, we achieve an HPSv2 score of 31.08, setting a new state-of-the-art
benchmark for one-step diffusion models.Summary
AI-Generated Summary