MARVEL-40M+: 多層次視覺闡釋,用於高保真度文本轉3D內容創作

MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation

November 26, 2024
作者: Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama, Shino Sam, Didier Stricker, Sk Aziz Ali, Muhammad Zeshan Afzal
cs.AI

摘要

從文本提示生成高保真度的3D內容仍然是計算機視覺中的一個重要挑戰,這是由於現有數據集的規模、多樣性和標註深度有限。為了應對這一挑戰,我們引入了MARVEL-40M+,這是一個包含4000萬文本標註的龐大數據集,涵蓋了從七個主要3D數據集中匯總的超過890萬個3D資產。我們的貢獻是一種新穎的多階段標註流程,該流程整合了開源預訓練的多視圖VLM和LLM,以自動生成從詳細(150-200字)到簡潔語義標籤(10-20字)的多級描述。這種結構支持精細的3D重建和快速原型設計。此外,我們將來自源數據集的人類元數據納入我們的標註流程中,以在標註中添加特定領域的信息並減少VLM的幻覺。此外,我們開發了MARVEL-FX3D,這是一個兩階段的文本到3D流程。我們使用我們的標註對Stable Diffusion進行微調,並使用預訓練的圖像到3D網絡在15秒內生成3D紋理網格。廣泛的評估顯示,MARVEL-40M+在標註質量和語言多樣性方面明顯優於現有數據集,通過GPT-4達到了72.41%的勝率,通過人類評估者達到了73.40%的勝率。
English
Generating high-fidelity 3D content from text prompts remains a significant challenge in computer vision due to the limited size, diversity, and annotation depth of the existing datasets. To address this, we introduce MARVEL-40M+, an extensive dataset with 40 million text annotations for over 8.9 million 3D assets aggregated from seven major 3D datasets. Our contribution is a novel multi-stage annotation pipeline that integrates open-source pretrained multi-view VLMs and LLMs to automatically produce multi-level descriptions, ranging from detailed (150-200 words) to concise semantic tags (10-20 words). This structure supports both fine-grained 3D reconstruction and rapid prototyping. Furthermore, we incorporate human metadata from source datasets into our annotation pipeline to add domain-specific information in our annotation and reduce VLM hallucinations. Additionally, we develop MARVEL-FX3D, a two-stage text-to-3D pipeline. We fine-tune Stable Diffusion with our annotations and use a pretrained image-to-3D network to generate 3D textured meshes within 15s. Extensive evaluations show that MARVEL-40M+ significantly outperforms existing datasets in annotation quality and linguistic diversity, achieving win rates of 72.41% by GPT-4 and 73.40% by human evaluators.

Summary

AI-Generated Summary

PDF214November 28, 2024