ChatPaper.aiChatPaper

SNOOPI:具有适当引导的超级增强一步扩散蒸馏

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

December 3, 2024
作者: Viet Nguyen, Anh Aengus Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran
cs.AI

摘要

最近的研究方法在将多步文本到图像扩散模型提炼为一步模型方面取得了令人期待的结果。最先进的高效提炼技术,即SwiftBrushv2(SBv2),甚至在资源有限的情况下超越了教师模型的性能。然而,我们的研究揭示了由于在变分分数提炼(VSD)损失中使用固定的引导尺度,处理不同扩散模型骨干时其不稳定性。现有一步扩散模型的另一个弱点是缺乏对负向提示引导的支持,在实际图像生成中至关重要。本文提出了SNOOPI,这是一个旨在通过增强一步扩散模型中的引导来解决这些限制的新颖框架,既在训练过程中又在推断过程中。首先,我们通过Proper Guidance-SwiftBrush(PG-SB)有效增强了训练稳定性,该方法采用了随机尺度、无需分类器的引导方法。通过改变教师模型的引导尺度,我们扩展了它们的输出分布,从而产生更稳健的VSD损失,使SB能够在各种骨干上有效地执行,同时保持竞争性能。其次,我们提出了一种无需训练的方法,称为Negative-Away Steer Attention(NASA),通过交叉注意力将负向提示整合到一步扩散模型中,以抑制生成图像中的不良元素。我们的实验结果表明,我们提出的方法在各种指标上显著改善了基线模型。值得注意的是,我们实现了31.08的HPSv2分数,为一步扩散模型设立了一个新的最先进基准。
English
Recent approaches have yielded promising results in distilling multi-step text-to-image diffusion models into one-step ones. The state-of-the-art efficient distillation technique, i.e., SwiftBrushv2 (SBv2), even surpasses the teacher model's performance with limited resources. However, our study reveals its instability when handling different diffusion model backbones due to using a fixed guidance scale within the Variational Score Distillation (VSD) loss. Another weakness of the existing one-step diffusion models is the missing support for negative prompt guidance, which is crucial in practical image generation. This paper presents SNOOPI, a novel framework designed to address these limitations by enhancing the guidance in one-step diffusion models during both training and inference. First, we effectively enhance training stability through Proper Guidance-SwiftBrush (PG-SB), which employs a random-scale classifier-free guidance approach. By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance. Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images. Our experimental results show that our proposed methods significantly improve baseline models across various metrics. Remarkably, we achieve an HPSv2 score of 31.08, setting a new state-of-the-art benchmark for one-step diffusion models.

Summary

AI-Generated Summary

PDF1143December 5, 2024