感知任務擴散模型的尺度特性

Scaling Properties of Diffusion Models for Perceptual Tasks

November 12, 2024
作者: Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik
cs.AI

摘要

本文主張,透過擴散模型的迭代計算,不僅為生成任務提供了一個強大範式,同時也適用於視覺知覺任務。我們將深度估計、光流和分割等任務統一歸納為圖像到圖像的轉換,並展示了擴散模型如何從訓練和測試時的計算規模化中受益,以應對這些知覺任務。通過對這些規模化行為的仔細分析,我們提出了各種技術,以有效地訓練擴散模型用於視覺知覺任務。我們的模型在使用大幅度較少的數據和計算資源的情況下,實現了優化或可與最先進方法相媲美的表現。欲使用我們的程式碼和模型,請參閱 https://scaling-diffusion-perception.github.io 。
English
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and segmentation under image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perception tasks. Through a careful analysis of these scaling behaviors, we present various techniques to efficiently train diffusion models for visual perception tasks. Our models achieve improved or comparable performance to state-of-the-art methods using significantly less data and compute. To use our code and models, see https://scaling-diffusion-perception.github.io .

Summary

AI-Generated Summary

PDF132November 13, 2024