小波潛在擴散(Wala):具有緊湊小波編碼的十億參數三維生成模型

Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

November 12, 2024
作者: Aditya Sanghi, Aliasghar Khani, Pradyumna Reddy, Arianna Rampini, Derek Cheung, Kamal Rahimi Malekshan, Kanika Madan, Hooman Shayani
cs.AI

摘要

大规模的3D生成模型需要大量的计算资源,但往往在捕捉高分辨率下的细节和复杂几何结构方面表现不佳。我们将这种限制归因于当前表示的低效性,这些表示缺乏有效建模生成模型所需的紧凑性。为了解决这个问题,我们引入了一种名为Wavelet Latent Diffusion(WaLa)的新方法,将3D形状编码为基于小波的紧凑潜在编码。具体而言,我们将一个256^3的有符号距离场压缩成一个12^3乘以4的潜在网格,实现了令人印象深刻的2427倍压缩比,同时最小化了细节损失。这种高度压缩的水平使我们的方法能够有效地训练大规模生成网络,而不会增加推断时间。我们的模型,无论是有条件的还是无条件的,都包含大约10亿个参数,并成功地在256^3分辨率下生成高质量的3D形状。此外,尽管模型规模庞大,WaLa提供了快速的推断,在两到四秒内生成形状,具体取决于条件。我们展示了在多个数据集上的最先进性能,生成质量、多样性和计算效率均有显著提高。我们开源我们的代码,并据我们所知,发布了跨不同模态的最大预训练3D生成模型。
English
Large-scale 3D generative models require substantial computational resources yet often fall short in capturing fine details and complex geometries at high resolutions. We attribute this limitation to the inefficiency of current representations, which lack the compactness required to model the generative models effectively. To address this, we introduce a novel approach called Wavelet Latent Diffusion, or WaLa, that encodes 3D shapes into wavelet-based, compact latent encodings. Specifically, we compress a 256^3 signed distance field into a 12^3 times 4 latent grid, achieving an impressive 2427x compression ratio with minimal loss of detail. This high level of compression allows our method to efficiently train large-scale generative networks without increasing the inference time. Our models, both conditional and unconditional, contain approximately one billion parameters and successfully generate high-quality 3D shapes at 256^3 resolution. Moreover, WaLa offers rapid inference, producing shapes within two to four seconds depending on the condition, despite the model's scale. We demonstrate state-of-the-art performance across multiple datasets, with significant improvements in generation quality, diversity, and computational efficiency. We open-source our code and, to the best of our knowledge, release the largest pretrained 3D generative models across different modalities.

Summary

AI-Generated Summary

PDF112November 13, 2024