将物体赋予生命:从3D物体生成4D效果
Bringing Objects to Life: 4D generation from 3D objects
December 29, 2024
作者: Ohad Rahamim, Ori Malca, Dvir Samuel, Gal Chechik
cs.AI
摘要
最近生成建模的进展使得可以利用文本提示控制创作4D内容(移动的3D物体)。4D生成在虚拟世界、媒体和游戏等领域具有巨大潜力,但现有方法对生成内容的外观和几何形状控制有限。在这项工作中,我们介绍了一种方法,通过以文本提示为指导进行4D生成,从而实现对用户提供的3D物体进行动画处理,实现定制动画同时保持原始物体的身份特征。我们首先将3D网格转换为保留输入物体视觉属性的“静态”4D神经辐射场(NeRF)。然后,我们使用由文本驱动的图像到视频扩散模型对物体进行动画处理。为了提高动态逼真度,我们引入了一种增量视角选择协议,用于采样透视图以促进逼真运动,并引入了基于掩码的得分蒸馏采样(SDS)损失,利用注意力图将优化集中在相关区域。我们通过时间连贯性、提示遵从性和视觉保真度评估了我们的模型,并发现我们的方法胜过基于其他方法的基准线,在使用LPIPS分数衡量的身份保持方面实现了最多三倍的改进,并有效平衡了视觉质量和动态内容。
English
Recent advancements in generative modeling now enable the creation of 4D
content (moving 3D objects) controlled with text prompts. 4D generation has
large potential in applications like virtual worlds, media, and gaming, but
existing methods provide limited control over the appearance and geometry of
generated content. In this work, we introduce a method for animating
user-provided 3D objects by conditioning on textual prompts to guide 4D
generation, enabling custom animations while maintaining the identity of the
original object. We first convert a 3D mesh into a ``static" 4D Neural Radiance
Field (NeRF) that preserves the visual attributes of the input object. Then, we
animate the object using an Image-to-Video diffusion model driven by text. To
improve motion realism, we introduce an incremental viewpoint selection
protocol for sampling perspectives to promote lifelike movement and a masked
Score Distillation Sampling (SDS) loss, which leverages attention maps to focus
optimization on relevant regions. We evaluate our model in terms of temporal
coherence, prompt adherence, and visual fidelity and find that our method
outperforms baselines that are based on other approaches, achieving up to
threefold improvements in identity preservation measured using LPIPS scores,
and effectively balancing visual quality with dynamic content.Summary
AI-Generated Summary