CAT4D:使用多視角視頻擴散模型在4D中創建任何事物
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
November 27, 2024
作者: Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T. Barron, Aleksander Holynski
cs.AI
摘要
我們提出了CAT4D,一種從單眼視頻創建4D(動態3D)場景的方法。CAT4D利用在多個視角上進行訓練的視頻擴散模型,該模型是在各種數據集的基礎上訓練的,從而能夠在任意指定的相機姿勢和時間戳下進行新視角合成。結合一種新穎的採樣方法,該模型可以將單眼視頻轉換為多視角視頻,通過優化可變形的3D高斯表示來實現強大的4D重建。我們展示了在新視角合成和動態場景重建基準測試中的競爭性表現,並突出了從真實或生成的視頻中生成4D場景的創造性能力。請查看我們的項目頁面以獲取結果和互動演示:cat-4d.github.io。
English
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular
video. CAT4D leverages a multi-view video diffusion model trained on a diverse
combination of datasets to enable novel view synthesis at any specified camera
poses and timestamps. Combined with a novel sampling approach, this model can
transform a single monocular video into a multi-view video, enabling robust 4D
reconstruction via optimization of a deformable 3D Gaussian representation. We
demonstrate competitive performance on novel view synthesis and dynamic scene
reconstruction benchmarks, and highlight the creative capabilities for 4D scene
generation from real or generated videos. See our project page for results and
interactive demos: cat-4d.github.io.Summary
AI-Generated Summary