지각 작업을 위한 확산 모델의 스케일링 특성

초록

본 논문에서는 확산 모델을 사용한 반복적 계산이 생성 뿐만 아니라 시각 지각 작업에 강력한 패러다임을 제공한다고 주장한다. 우리는 깊이 추정, 광학 흐름 및 분할과 같은 작업을 이미지 간 변환의 하위로 통합하고, 확산 모델이 이러한 지각 작업에 대한 교육 및 테스트 시간 계산의 확장에서 어떻게 이득을 얻는지 보여준다. 이러한 확장 행동을 주의 깊게 분석하여 시각 지각 작업을 위해 확산 모델을 효율적으로 교육하는 다양한 기술을 제시한다. 우리의 모델은 상위 수준의 방법과 비교하여 훨씬 적은 데이터 및 계산을 사용하여 향상된 또는 비교 가능한 성능을 달성한다. 코드 및 모델을 사용하려면 https://scaling-diffusion-perception.github.io 를 참조하십시오.

English

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and segmentation under image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perception tasks. Through a careful analysis of these scaling behaviors, we present various techniques to efficiently train diffusion models for visual perception tasks. Our models achieve improved or comparable performance to state-of-the-art methods using significantly less data and compute. To use our code and models, see https://scaling-diffusion-perception.github.io .

지각 작업을 위한 확산 모델의 스케일링 특성

Scaling Properties of Diffusion Models for Perceptual Tasks

초록

Summary

Support