Flex3D:具彈性重建模型和輸入視圖整理的前饋式3D生成
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation
October 1, 2024
作者: Junlin Han, Jianyuan Wang, Andrea Vedaldi, Philip Torr, Filippos Kokkinos
cs.AI
摘要
從文字、單張圖片或稀疏視圖生成高質量的3D內容仍然是一項具有廣泛應用的具有挑戰性的任務。現有方法通常採用多視圖擴散模型來合成多視圖圖像,然後進行3D重建的前向過程。然而,這些方法通常受制於少量且固定的輸入視圖,限制了捕捉多樣觀點的能力,甚至更糟糕的是,如果合成的視圖質量不佳,將導致次優的生成結果。為了解決這些限制,我們提出了Flex3D,一種新穎的兩階段框架,能夠利用任意數量的高質量輸入視圖。第一階段包括候選視圖生成和整理流程。我們採用了經過微調的多視圖圖像擴散模型和視頻擴散模型來生成候選視圖池,實現對目標3D物體的豐富表示。隨後,視圖選擇流程根據質量和一致性篩選這些視圖,確保只有高質量和可靠的視圖用於重建。在第二階段,經過整理的視圖被餵入一個靈活的重建模型(FlexRM),該模型基於一個可以有效處理任意數量輸入的變壓器架構。FlexRM直接輸出3D高斯點,利用三平面表示,實現高效且詳細的3D生成。通過對設計和訓練策略的廣泛探索,我們優化了FlexRM,實現了在重建和生成任務中卓越的性能。我們的結果表明,與幾個最新的前向3D生成模型相比,Flex3D在3D生成任務中的用戶研究勝率超過92%,實現了最先進的性能。
English
Generating high-quality 3D content from text, single images, or sparse view
images remains a challenging task with broad applications.Existing methods
typically employ multi-view diffusion models to synthesize multi-view images,
followed by a feed-forward process for 3D reconstruction. However, these
approaches are often constrained by a small and fixed number of input views,
limiting their ability to capture diverse viewpoints and, even worse, leading
to suboptimal generation results if the synthesized views are of poor quality.
To address these limitations, we propose Flex3D, a novel two-stage framework
capable of leveraging an arbitrary number of high-quality input views. The
first stage consists of a candidate view generation and curation pipeline. We
employ a fine-tuned multi-view image diffusion model and a video diffusion
model to generate a pool of candidate views, enabling a rich representation of
the target 3D object. Subsequently, a view selection pipeline filters these
views based on quality and consistency, ensuring that only the high-quality and
reliable views are used for reconstruction. In the second stage, the curated
views are fed into a Flexible Reconstruction Model (FlexRM), built upon a
transformer architecture that can effectively process an arbitrary number of
inputs. FlemRM directly outputs 3D Gaussian points leveraging a tri-plane
representation, enabling efficient and detailed 3D generation. Through
extensive exploration of design and training strategies, we optimize FlexRM to
achieve superior performance in both reconstruction and generation tasks. Our
results demonstrate that Flex3D achieves state-of-the-art performance, with a
user study winning rate of over 92% in 3D generation tasks when compared to
several of the latest feed-forward 3D generative models.Summary
AI-Generated Summary