CleanDIFT：無噪音擴散特徵

摘要

最近，從大規模預訓練擴散模型中提取的內部特徵已被證明是一種強大的語義描述符，適用於各種下游任務。使用這些特徵的作品通常需要在將圖像通過模型之前向圖像添加噪音，以獲得語義特徵，因為當圖像幾乎沒有噪音時，模型並不提供最有用的特徵。我們表明，這種噪音對這些特徵的有用性有著至關重要的影響，無法通過與不同隨機噪音進行集成來補救。我們通過引入一種輕量級的無監督微調方法來解決這個問題，使得擴散主幹能夠提供高質量、無噪音的語義特徵。我們展示這些特徵在各種提取設置和下游任務中輕鬆地優於以往的擴散特徵，甚至在成本的一小部分下，提供比基於集成方法更好的性能。

English

Internal features from large-scale pre-trained diffusion models have recently been established as powerful semantic descriptors for a wide range of downstream tasks. Works that use these features generally need to add noise to images before passing them through the model to obtain the semantic features, as the models do not offer the most useful features when given images with little to no noise. We show that this noise has a critical impact on the usefulness of these features that cannot be remedied by ensembling with different random noises. We address this issue by introducing a lightweight, unsupervised fine-tuning method that enables diffusion backbones to provide high-quality, noise-free semantic features. We show that these features readily outperform previous diffusion features by a wide margin in a wide variety of extraction setups and downstream tasks, offering better performance than even ensemble-based methods at a fraction of the cost.

CleanDIFT：無噪音擴散特徵

CleanDIFT: Diffusion Features without Noise

摘要

Summary

Support