長LRM:廣泛覆蓋的長序列大重建模型,用於寬覆蓋高斯斑點。
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
October 16, 2024
作者: Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, Zexiang Xu
cs.AI
摘要
我們提出了Long-LRM,一個通用的3D高斯重建模型,能夠從一長序列的輸入圖像中重建出一個大場景。具體來說,我們的模型可以在單個A100 80G GPU上僅需1.3秒的時間內處理32張960x540解析度的源圖像。我們的架構採用了最新的Mamba2區塊和經典的Transformer區塊的混合,使得可以處理比以往更多的token,同時通過高效的token合併和高斯修剪步驟來在質量和效率之間取得平衡。與先前僅能處理1~4個輸入圖像並且僅能重建大場景的一小部分的前馈模型不同,Long-LRM可以在單個前馈步驟中重建整個場景。在大規模場景數據集(如DL3DV-140和Tanks and Temples)上,我們的方法實現了與基於優化的方法相當的性能,同時效率提高了兩個數量級。項目頁面:https://arthurhero.github.io/projects/llrm
English
We propose Long-LRM, a generalizable 3D Gaussian reconstruction model that is
capable of reconstructing a large scene from a long sequence of input images.
Specifically, our model can process 32 source images at 960x540 resolution
within only 1.3 seconds on a single A100 80G GPU. Our architecture features a
mixture of the recent Mamba2 blocks and the classical transformer blocks which
allowed many more tokens to be processed than prior work, enhanced by efficient
token merging and Gaussian pruning steps that balance between quality and
efficiency. Unlike previous feed-forward models that are limited to processing
1~4 input images and can only reconstruct a small portion of a large scene,
Long-LRM reconstructs the entire scene in a single feed-forward step. On
large-scale scene datasets such as DL3DV-140 and Tanks and Temples, our method
achieves performance comparable to optimization-based approaches while being
two orders of magnitude more efficient. Project page:
https://arthurhero.github.io/projects/llrmSummary
AI-Generated Summary