CAT4D: 다중 뷰 비디오 확산 모델을 사용하여 4D에서 모든 것을 만들다.

초록

우리는 CAT4D를 제시합니다. 이는 단안 비디오로부터 4D(동적 3D) 장면을 생성하는 방법입니다. CAT4D는 다양한 조합의 데이터셋에서 훈련된 다중 뷰 비디오 확산 모델을 활용하여 특정 카메라 위치와 타임스탬프에서의 새로운 뷰 합성을 가능하게 합니다. 새로운 샘플링 접근 방식과 결합된 이 모델은 단일 단안 비디오를 다중 뷰 비디오로 변환하여, 변형 가능한 3D 가우시안 표현의 최적화를 통해 견고한 4D 재구성을 가능하게 합니다. 우리는 새로운 뷰 합성 및 동적 장면 재구성 벤치마크에서 경쟁력 있는 성능을 보여주며, 실제 또는 생성된 비디오로부터 4D 장면 생성에 대한 창의적 능력을 강조합니다. 결과 및 인터랙티브 데모에 대해서는 저희 프로젝트 페이지를 참조하세요: cat-4d.github.io.

English

We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconstruction via optimization of a deformable 3D Gaussian representation. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks, and highlight the creative capabilities for 4D scene generation from real or generated videos. See our project page for results and interactive demos: cat-4d.github.io.

CAT4D: 다중 뷰 비디오 확산 모델을 사용하여 4D에서 모든 것을 만들다.

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

초록

Summary

Support