UniReal:通過學習真實世界動態的方式進行通用圖像生成和編輯
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
December 10, 2024
作者: Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao
cs.AI
摘要
我們介紹UniReal,這是一個統一框架,旨在應對各種圖像生成和編輯任務。現有解決方案通常根據任務而有所不同,但共享基本原則:在捕捉視覺變化的同時保持輸入和輸出之間的一致性。受到最近有效平衡幀間一致性和變化的視頻生成模型的啟發,我們提出了一種統一方法,將圖像級任務視為不連續的視頻生成。具體而言,我們將不同數量的輸入和輸出圖像視為幀,從而無縫支持圖像生成、編輯、定制、合成等任務。儘管設計用於圖像級任務,但我們利用視頻作為通用監督的可擴展來源。UniReal從大規模視頻中學習世界動態,展示了處理陰影、反射、姿勢變化和物體交互的高級能力,同時還展現了對新應用的新興能力。
English
We introduce UniReal, a unified framework designed to address various image
generation and editing tasks. Existing solutions often vary by tasks, yet share
fundamental principles: preserving consistency between inputs and outputs while
capturing visual variations. Inspired by recent video generation models that
effectively balance consistency and variation across frames, we propose a
unifying approach that treats image-level tasks as discontinuous video
generation. Specifically, we treat varying numbers of input and output images
as frames, enabling seamless support for tasks such as image generation,
editing, customization, composition, etc. Although designed for image-level
tasks, we leverage videos as a scalable source for universal supervision.
UniReal learns world dynamics from large-scale videos, demonstrating advanced
capability in handling shadows, reflections, pose variation, and object
interaction, while also exhibiting emergent capability for novel applications.Summary
AI-Generated Summary