Vivid4D:通过视频修复技术提升单目视频的4D重建效果
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting
April 15, 2025
作者: Jiaxin Huang, Sheng Miao, BangBnag Yang, Yuewen Ma, Yiyi Liao
cs.AI
摘要
从随意拍摄的单目视频中重建4D动态场景具有重要价值,但也极具挑战性,因为每个时间戳仅从单一视角进行观察。我们提出了Vivid4D,一种创新方法,通过增强观察视角——从单目输入合成多视角视频,来提升4D单目视频的合成效果。与现有方法不同,这些方法要么仅依赖几何先验进行监督,要么在利用生成先验时忽视了几何信息,我们则将两者结合。这一方法将视角增强重新定义为视频修复任务,其中观察到的视图基于单目深度先验被扭曲至新的视角。为此,我们在未定位的网络视频上训练了一个视频修复模型,使用模拟扭曲遮挡的合成掩码,确保缺失区域在空间和时间上的一致性补全。为了进一步减轻单目深度先验的不准确性,我们引入了迭代视角增强策略和鲁棒的重建损失函数。实验证明,我们的方法有效改进了单目4D场景的重建与补全效果。
English
Reconstructing 4D dynamic scenes from casually captured monocular videos is
valuable but highly challenging, as each timestamp is observed from a single
viewpoint. We introduce Vivid4D, a novel approach that enhances 4D monocular
video synthesis by augmenting observation views - synthesizing multi-view
videos from a monocular input. Unlike existing methods that either solely
leverage geometric priors for supervision or use generative priors while
overlooking geometry, we integrate both. This reformulates view augmentation as
a video inpainting task, where observed views are warped into new viewpoints
based on monocular depth priors. To achieve this, we train a video inpainting
model on unposed web videos with synthetically generated masks that mimic
warping occlusions, ensuring spatially and temporally consistent completion of
missing regions. To further mitigate inaccuracies in monocular depth priors, we
introduce an iterative view augmentation strategy and a robust reconstruction
loss. Experiments demonstrate that our method effectively improves monocular 4D
scene reconstruction and completion.Summary
AI-Generated Summary