ChatPaper.aiChatPaper

走向视频生成中的物理理解:一种3D点正则化方法

Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach

February 5, 2025
作者: Yunuo Chen, Junli Cao, Anil Kag, Vidit Goel, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren
cs.AI

摘要

我们提出了一种新颖的视频生成框架,该框架集成了三维几何和动态意识。为实现这一目标,我们通过在像素空间中增加三维点轨迹来扩充二维视频。得到的三维感知视频数据集PointVid,然后用于微调潜在扩散模型,使其能够跟踪具有三维笛卡尔坐标的二维物体。在此基础上,我们对视频中的物体形状和运动进行规范化,消除不良伪影,例如非物理变形。因此,我们提高了生成的RGB视频质量,并减轻了常见问题,如由于缺乏形状感知而普遍存在于当前视频模型中的物体变形。通过我们的三维扩充和规范化,我们的模型能够处理接触丰富的场景,例如面向任务的视频。这些视频涉及固体的复杂相互作用,其中三维信息对于感知变形和接触至关重要。此外,我们的模型通过促进移动物体的三维一致性并减少形状和运动的突变,提高了视频生成的整体质量。
English
We present a novel video generation framework that integrates 3-dimensional geometry and dynamic awareness. To achieve this, we augment 2D videos with 3D point trajectories and align them in pixel space. The resulting 3D-aware video dataset, PointVid, is then used to fine-tune a latent diffusion model, enabling it to track 2D objects with 3D Cartesian coordinates. Building on this, we regularize the shape and motion of objects in the video to eliminate undesired artifacts, \eg, nonphysical deformation. Consequently, we enhance the quality of generated RGB videos and alleviate common issues like object morphing, which are prevalent in current video models due to a lack of shape awareness. With our 3D augmentation and regularization, our model is capable of handling contact-rich scenarios such as task-oriented videos. These videos involve complex interactions of solids, where 3D information is essential for perceiving deformation and contact. Furthermore, our model improves the overall quality of video generation by promoting the 3D consistency of moving objects and reducing abrupt changes in shape and motion.

Summary

AI-Generated Summary

PDF93February 7, 2025