모두를 생성하기 위한 하나의 확산

초록

저희는 OneDiffusion을 소개합니다. 이는 다양한 작업을 효율적으로 지원하는 다목적 대규모 확산 모델로, 양방향 이미지 합성과 이해를 매끄럽게 지원합니다. 이 모델은 텍스트, 깊이, 자세, 레이아웃, 의미 지도와 같은 입력으로부터 조건付 생성을 가능케 하며, 이미지 흐림 제거, 확대, 그리고 깊이 추정, 분할과 같은 역과정을 처리합니다. 더불어, OneDiffusion은 다중 뷰 생성, 카메라 자세 추정, 순차적 이미지 입력을 이용한 즉각적인 개인화도 가능합니다. 저희 모델은 모든 작업을 훈련 중에 다양한 잡음 스케일을 가진 프레임 시퀀스로 취급하여 간단하면서도 효과적인 방식을 채택하며, 추론 시 어떤 프레임이든 조건 이미지로 작용할 수 있도록 합니다. 통합된 훈련 프레임워크는 전문화된 아키텍처의 필요성을 없애며, 확장 가능한 다중 작업 훈련을 지원하며, 어떤 해상도에도 원활하게 적응하여 일반화 및 확장성을 향상시킵니다. 실험 결과는 텍스트에서 이미지로, 다중 뷰 생성, ID 보존, 깊이 추정, 카메라 자세 추정과 같은 생성 및 예측 작업에서 경쟁력 있는 성능을 보여주었으며, 상대적으로 작은 훈련 데이터셋에도 불구하고 우수한 성과를 얻었습니다. 저희의 코드와 체크포인트는 https://github.com/lehduong/OneDiffusion에서 무료로 제공됩니다.

English

We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps, while also handling tasks like image deblurring, upscaling, and reverse processes such as depth estimation and segmentation. Additionally, OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs. Our model takes a straightforward yet effective approach by treating all tasks as frame sequences with varying noise scales during training, allowing any frame to act as a conditioning image at inference time. Our unified training framework removes the need for specialized architectures, supports scalable multi-task training, and adapts smoothly to any resolution, enhancing both generalization and scalability. Experimental results demonstrate competitive performance across tasks in both generation and prediction such as text-to-image, multiview generation, ID preservation, depth estimation and camera pose estimation despite relatively small training dataset. Our code and checkpoint are freely available at https://github.com/lehduong/OneDiffusion

모두를 생성하기 위한 하나의 확산

One Diffusion to Generate Them All

초록

Summary

Support