MeshCraft:探索基于流式扩散变换器的高效可控网格生成
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
March 29, 2025
作者: Xianglong He, Junyi Chen, Di Huang, Zexiang Liu, Xiaoshui Huang, Wanli Ouyang, Chun Yuan, Yangguang Li
cs.AI
摘要
在三维内容创作领域,通过AI模型实现最优网格拓扑结构一直是3D艺术家们的追求。先前的方法,如MeshGPT,已探索了通过网格自回归技术生成可直接使用的3D物体。尽管这些方法能产生视觉效果令人印象深刻的结果,但其在自回归过程中依赖逐令牌预测的方式导致了几个显著限制,包括生成速度极慢和网格面数不可控。本文中,我们提出了MeshCraft,一个高效且可控的网格生成新框架,它利用连续空间扩散来生成离散的三角面。具体而言,MeshCraft包含两个核心组件:1)一个基于Transformer的变分自编码器(VAE),它将原始网格编码为连续的面级别令牌,并将其解码回原始网格;2)一个基于流的扩散Transformer,该Transformer以面数为条件,能够生成具有预设面数的高质量3D网格。通过使用扩散模型同时生成整个网格拓扑,MeshCraft在显著快于自回归方法的速度下实现了高保真网格生成。具体来说,MeshCraft能在仅3.2秒内生成一个800面的网格(比现有基线快35倍)。大量实验表明,在ShapeNet数据集上的定性和定量评估中,MeshCraft均优于最先进的技术,并在Objaverse数据集上展现了卓越性能。此外,它能无缝集成现有的条件引导策略,展示了其减轻艺术家在网格创建中耗时手工工作的潜力。
English
In the domain of 3D content creation, achieving optimal mesh topology through
AI models has long been a pursuit for 3D artists. Previous methods, such as
MeshGPT, have explored the generation of ready-to-use 3D objects via mesh
auto-regressive techniques. While these methods produce visually impressive
results, their reliance on token-by-token predictions in the auto-regressive
process leads to several significant limitations. These include extremely slow
generation speeds and an uncontrollable number of mesh faces. In this paper, we
introduce MeshCraft, a novel framework for efficient and controllable mesh
generation, which leverages continuous spatial diffusion to generate discrete
triangle faces. Specifically, MeshCraft consists of two core components: 1) a
transformer-based VAE that encodes raw meshes into continuous face-level tokens
and decodes them back to the original meshes, and 2) a flow-based diffusion
transformer conditioned on the number of faces, enabling the generation of
high-quality 3D meshes with a predefined number of faces. By utilizing the
diffusion model for the simultaneous generation of the entire mesh topology,
MeshCraft achieves high-fidelity mesh generation at significantly faster speeds
compared to auto-regressive methods. Specifically, MeshCraft can generate an
800-face mesh in just 3.2 seconds (35times faster than existing baselines).
Extensive experiments demonstrate that MeshCraft outperforms state-of-the-art
techniques in both qualitative and quantitative evaluations on ShapeNet dataset
and demonstrates superior performance on Objaverse dataset. Moreover, it
integrates seamlessly with existing conditional guidance strategies, showcasing
its potential to relieve artists from the time-consuming manual work involved
in mesh creation.Summary
AI-Generated Summary