UniReal:通过学习现实世界动态实现通用图像生成和编辑
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
December 10, 2024
作者: Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao
cs.AI
摘要
我们介绍UniReal,这是一个统一的框架,旨在解决各种图像生成和编辑任务。现有解决方案通常因任务而异,但共享基本原则:在捕捉视觉变化的同时保持输入和输出之间的一致性。受最近视频生成模型的启发,这些模型能够有效地在帧之间平衡一致性和变化,我们提出了一种统一方法,将图像级任务视为不连续的视频生成。具体而言,我们将不同数量的输入和输出图像视为帧,从而无缝支持诸如图像生成、编辑、定制、合成等任务。尽管设计用于图像级任务,我们利用视频作为通用监督的可扩展来源。UniReal从大规模视频中学习世界动态,展示了处理阴影、反射、姿势变化和物体交互的先进能力,同时还展现了新应用的新兴能力。
English
We introduce UniReal, a unified framework designed to address various image
generation and editing tasks. Existing solutions often vary by tasks, yet share
fundamental principles: preserving consistency between inputs and outputs while
capturing visual variations. Inspired by recent video generation models that
effectively balance consistency and variation across frames, we propose a
unifying approach that treats image-level tasks as discontinuous video
generation. Specifically, we treat varying numbers of input and output images
as frames, enabling seamless support for tasks such as image generation,
editing, customization, composition, etc. Although designed for image-level
tasks, we leverage videos as a scalable source for universal supervision.
UniReal learns world dynamics from large-scale videos, demonstrating advanced
capability in handling shadows, reflections, pose variation, and object
interaction, while also exhibiting emergent capability for novel applications.Summary
AI-Generated Summary