賦予物體生命:從3D物體生成4D。

Bringing Objects to Life: 4D generation from 3D objects

December 29, 2024
作者: Ohad Rahamim, Ori Malca, Dvir Samuel, Gal Chechik
cs.AI

摘要

最近在生成建模方面的進展現在使得可以通過文本提示來控制的方式創建4D內容(移動的3D物體)。 4D生成在虛擬世界、媒體和遊戲等應用中具有巨大潛力,但現有方法對生成內容的外觀和幾何形狀提供的控制有限。在這項工作中,我們介紹了一種方法,通過條件化文本提示來引導4D生成,從而實現對用戶提供的3D物體進行動畫化,同時保持原始物體的身份。我們首先將3D網格轉換為保留輸入物體視覺屬性的“靜態”4D神經輻射場(NeRF)。然後,我們使用由文本驅動的圖像到視頻擴散模型來對物體進行動畫化。為了提高運動的真實感,我們引入了一種增量視角選擇協議,用於採樣視角以促進逼真的運動,以及一種採用遮罩分散抽樣(SDS)損失的得分蒸餾採樣方法,該方法利用注意力地圖來將優化集中在相關區域。我們通過時間相干性、提示遵循性和視覺保真度來評估我們的模型,發現我們的方法優於基於其他方法的基線,在使用LPIPS分數衡量的身份保留方面實現了最多三倍的改進,並有效平衡了視覺質量與動態內容。
English
Recent advancements in generative modeling now enable the creation of 4D content (moving 3D objects) controlled with text prompts. 4D generation has large potential in applications like virtual worlds, media, and gaming, but existing methods provide limited control over the appearance and geometry of generated content. In this work, we introduce a method for animating user-provided 3D objects by conditioning on textual prompts to guide 4D generation, enabling custom animations while maintaining the identity of the original object. We first convert a 3D mesh into a ``static" 4D Neural Radiance Field (NeRF) that preserves the visual attributes of the input object. Then, we animate the object using an Image-to-Video diffusion model driven by text. To improve motion realism, we introduce an incremental viewpoint selection protocol for sampling perspectives to promote lifelike movement and a masked Score Distillation Sampling (SDS) loss, which leverages attention maps to focus optimization on relevant regions. We evaluate our model in terms of temporal coherence, prompt adherence, and visual fidelity and find that our method outperforms baselines that are based on other approaches, achieving up to threefold improvements in identity preservation measured using LPIPS scores, and effectively balancing visual quality with dynamic content.

Summary

AI-Generated Summary

PDF342December 31, 2024