CineMaster:一种用于影视文本到视频生成的3D感知和可控框架
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
February 12, 2025
作者: Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu, Tianfan Xue, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai
cs.AI
摘要
在这项工作中,我们提出了CineMaster,这是一个用于三维感知和可控文本到视频生成的新框架。我们的目标是赋予用户与专业电影导演可比的可控性:在场景中精确放置物体、在三维空间中灵活操纵物体和摄像机,以及直观控制渲染帧的布局。为实现这一目标,CineMaster分为两个阶段。在第一阶段,我们设计了一个交互式工作流程,允许用户通过在三维空间中放置物体边界框和定义摄像机移动来直观构建三维感知的条件信号。在第二阶段,这些控制信号——包括渲染的深度图、摄像机轨迹和物体类别标签——作为文本到视频扩散模型的指导,确保生成用户期望的视频内容。此外,为了克服野外数据集中缺乏带有三维物体运动和摄像机姿势注释的问题,我们精心建立了一个自动化数据注释流水线,从大规模视频数据中提取三维边界框和摄像机轨迹。广泛的定性和定量实验表明,CineMaster明显优于现有方法,并实现了显著的三维感知文本到视频生成。项目页面:https://cinemaster-dev.github.io/。
English
In this work, we present CineMaster, a novel framework for 3D-aware and
controllable text-to-video generation. Our goal is to empower users with
comparable controllability as professional film directors: precise placement of
objects within the scene, flexible manipulation of both objects and camera in
3D space, and intuitive layout control over the rendered frames. To achieve
this, CineMaster operates in two stages. In the first stage, we design an
interactive workflow that allows users to intuitively construct 3D-aware
conditional signals by positioning object bounding boxes and defining camera
movements within the 3D space. In the second stage, these control
signals--comprising rendered depth maps, camera trajectories and object class
labels--serve as the guidance for a text-to-video diffusion model, ensuring to
generate the user-intended video content. Furthermore, to overcome the scarcity
of in-the-wild datasets with 3D object motion and camera pose annotations, we
carefully establish an automated data annotation pipeline that extracts 3D
bounding boxes and camera trajectories from large-scale video data. Extensive
qualitative and quantitative experiments demonstrate that CineMaster
significantly outperforms existing methods and implements prominent 3D-aware
text-to-video generation. Project page: https://cinemaster-dev.github.io/.Summary
AI-Generated Summary