UniReal：通过学习现实世界动态实现通用图像生成和编辑

摘要

我们介绍UniReal，这是一个统一的框架，旨在解决各种图像生成和编辑任务。现有解决方案通常因任务而异，但共享基本原则：在捕捉视觉变化的同时保持输入和输出之间的一致性。受最近视频生成模型的启发，这些模型能够有效地在帧之间平衡一致性和变化，我们提出了一种统一方法，将图像级任务视为不连续的视频生成。具体而言，我们将不同数量的输入和输出图像视为帧，从而无缝支持诸如图像生成、编辑、定制、合成等任务。尽管设计用于图像级任务，我们利用视频作为通用监督的可扩展来源。UniReal从大规模视频中学习世界动态，展示了处理阴影、反射、姿势变化和物体交互的先进能力，同时还展现了新应用的新兴能力。

English

We introduce UniReal, a unified framework designed to address various image generation and editing tasks. Existing solutions often vary by tasks, yet share fundamental principles: preserving consistency between inputs and outputs while capturing visual variations. Inspired by recent video generation models that effectively balance consistency and variation across frames, we propose a unifying approach that treats image-level tasks as discontinuous video generation. Specifically, we treat varying numbers of input and output images as frames, enabling seamless support for tasks such as image generation, editing, customization, composition, etc. Although designed for image-level tasks, we leverage videos as a scalable source for universal supervision. UniReal learns world dynamics from large-scale videos, demonstrating advanced capability in handling shadows, reflections, pose variation, and object interaction, while also exhibiting emergent capability for novel applications.

UniReal：通过学习现实世界动态实现通用图像生成和编辑

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

摘要

Summary

Support