CleanDIFT：无噪声扩散特征

摘要

最近，大规模预训练扩散模型的内部特征已被确认为广泛下游任务中强大的语义描述符。使用这些特征的作品通常需要在将图像通过模型传递以获得语义特征之前向图像添加噪声，因为当给定几乎没有噪声的图像时，模型并不提供最有用的特征。我们表明，这种噪声对这些特征的有用性具有关键影响，无法通过与不同随机噪声集成来解决。我们通过引入一种轻量级的无监督微调方法来解决这个问题，使扩散骨干能够提供高质量、无噪声的语义特征。我们展示这些特征在各种提取设置和下游任务中迅速超越以往的扩散特征，甚至在成本的一小部分下比基于集成的方法表现更好。

English

Internal features from large-scale pre-trained diffusion models have recently been established as powerful semantic descriptors for a wide range of downstream tasks. Works that use these features generally need to add noise to images before passing them through the model to obtain the semantic features, as the models do not offer the most useful features when given images with little to no noise. We show that this noise has a critical impact on the usefulness of these features that cannot be remedied by ensembling with different random noises. We address this issue by introducing a lightweight, unsupervised fine-tuning method that enables diffusion backbones to provide high-quality, noise-free semantic features. We show that these features readily outperform previous diffusion features by a wide margin in a wide variety of extraction setups and downstream tasks, offering better performance than even ensemble-based methods at a fraction of the cost.

CleanDIFT：无噪声扩散特征

CleanDIFT: Diffusion Features without Noise

摘要

Summary

Support

Support