ChatPaper.aiChatPaper

Reangle-A-Video:四维视频生成作为视频到视频的转换

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

March 12, 2025
作者: Hyeonho Jeong, Suhyeon Lee, Jong Chul Ye
cs.AI

摘要

我们提出了Reangle-A-Video,这是一个从单一输入视频生成同步多视角视频的统一框架。与主流方法在大规模4D数据集上训练多视角视频扩散模型不同,我们的方法将多视角视频生成任务重新定义为视频到视频的转换,利用公开可用的图像和视频扩散先验。本质上,Reangle-A-Video分两个阶段运行。(1) 多视角运动学习:以自监督的方式同步微调一个图像到视频的扩散变换器,从一组扭曲的视频中提取视角不变的运动。(2) 多视角一致的图像到图像转换:在推理时使用DUSt3R进行跨视角一致性指导,将输入视频的第一帧扭曲并修复成不同的相机视角,生成多视角一致的起始图像。在静态视角转换和动态相机控制上的大量实验表明,Reangle-A-Video超越了现有方法,为多视角视频生成确立了一种新的解决方案。我们将公开代码和数据。项目页面:https://hyeonho99.github.io/reangle-a-video/
English
We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video. Unlike mainstream approaches that train multi-view video diffusion models on large-scale 4D datasets, our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors. In essence, Reangle-A-Video operates in two stages. (1) Multi-View Motion Learning: An image-to-video diffusion transformer is synchronously fine-tuned in a self-supervised manner to distill view-invariant motion from a set of warped videos. (2) Multi-View Consistent Image-to-Images Translation: The first frame of the input video is warped and inpainted into various camera perspectives under an inference-time cross-view consistency guidance using DUSt3R, generating multi-view consistent starting images. Extensive experiments on static view transport and dynamic camera control show that Reangle-A-Video surpasses existing methods, establishing a new solution for multi-view video generation. We will publicly release our code and data. Project page: https://hyeonho99.github.io/reangle-a-video/

Summary

AI-Generated Summary

PDF282March 13, 2025