將生成式去噪與判別式目標對齊,釋放擴散模型在視覺感知中的潛力
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
April 15, 2025
作者: Ziqi Pang, Xin Xu, Yu-Xiong Wang
cs.AI
摘要
隨著圖像生成的成功,生成式擴散模型逐漸被應用於判別任務中,因為像素生成提供了一種統一的感知介面。然而,直接將生成式去噪過程重新用於判別目標時,暴露出了一些以往很少被解決的關鍵差距。生成模型如果最終分佈仍然合理,則可以容忍中間採樣錯誤,但判別任務則需要全程保持嚴格的準確性,這在具有挑戰性的多模態任務(如參考圖像分割)中尤為明顯。基於這一差距,我們分析並增強了生成式擴散過程與感知任務之間的對齊,重點關注去噪過程中感知質量的演變。我們發現:(1)早期的去噪步驟對感知質量的貢獻不成比例,這促使我們提出反映不同時間步貢獻的定制學習目標;(2)後期的去噪步驟顯示出意外的感知退化,這凸顯了對訓練-去噪分佈變化的敏感性,我們通過專門為擴散模型設計的數據增強來解決這一問題;(3)生成過程獨特地實現了交互性,作為可控的用戶介面,能夠在多輪交互中適應校正提示。我們的見解在不改變架構的情況下顯著改善了基於擴散的感知模型,在深度估計、參考圖像分割和通用感知任務中達到了最先進的性能。代碼可在 https://github.com/ziqipang/ADDP 獲取。
English
With the success of image generation, generative diffusion models are
increasingly adopted for discriminative tasks, as pixel generation provides a
unified perception interface. However, directly repurposing the generative
denoising process for discriminative objectives reveals critical gaps rarely
addressed previously. Generative models tolerate intermediate sampling errors
if the final distribution remains plausible, but discriminative tasks require
rigorous accuracy throughout, as evidenced in challenging multi-modal tasks
like referring image segmentation. Motivated by this gap, we analyze and
enhance alignment between generative diffusion processes and perception tasks,
focusing on how perception quality evolves during denoising. We find: (1)
earlier denoising steps contribute disproportionately to perception quality,
prompting us to propose tailored learning objectives reflecting varying
timestep contributions; (2) later denoising steps show unexpected perception
degradation, highlighting sensitivity to training-denoising distribution
shifts, addressed by our diffusion-tailored data augmentation; and (3)
generative processes uniquely enable interactivity, serving as controllable
user interfaces adaptable to correctional prompts in multi-round interactions.
Our insights significantly improve diffusion-based perception models without
architectural changes, achieving state-of-the-art performance on depth
estimation, referring image segmentation, and generalist perception tasks. Code
available at https://github.com/ziqipang/ADDP.Summary
AI-Generated Summary