ChatPaper.aiChatPaper

LVSM:具有最小3D歸納偏差的大視圖合成模型

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

October 22, 2024
作者: Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, Zexiang Xu
cs.AI

摘要

我們提出了大視角合成模型(LVSM),這是一種基於Transformer的創新方法,用於從稀疏視角輸入中進行可擴展且通用的新視角合成。我們引入了兩種架構:(1)編碼器-解碼器LVSM,將輸入圖像標記編碼為固定數量的1D潛在標記,作為完全學習的場景表示,並從中解碼新視角圖像;以及(2)僅解碼器LVSM,直接將輸入圖像映射到新視角輸出,完全消除中間場景表示。這兩種模型都繞過了先前方法中使用的3D歸納偏差,從3D表示(例如NeRF、3DGS)到網絡設計(例如,對架投影、平面掃描),以全面數據驅動的方法處理新視角合成。儘管編碼器-解碼器模型由於其獨立潛在表示而提供更快的推理,但僅解碼器LVSM實現了卓越的質量、可擴展性和零樣本泛化,優於先前的最先進方法1.5至3.5 dB PSNR。跨多個數據集的全面評估表明,這兩種LVSM變體均實現了最先進的新視角合成質量。值得注意的是,即使使用較少的計算資源(1-2個GPU),我們的模型也超越了所有先前的方法。詳細信息請參見我們的網站:https://haian-jin.github.io/projects/LVSM/。
English
We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens, functioning as a fully learned scene representation, and decodes novel-view images from them; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs, completely eliminating intermediate scene representations. Both models bypass the 3D inductive biases used in previous methods -- from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps) -- addressing novel view synthesis with a fully data-driven approach. While the encoder-decoder model offers faster inference due to its independent latent representation, the decoder-only LVSM achieves superior quality, scalability, and zero-shot generalization, outperforming previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. Comprehensive evaluations across multiple datasets demonstrate that both LVSM variants achieve state-of-the-art novel view synthesis quality. Notably, our models surpass all previous methods even with reduced computational resources (1-2 GPUs). Please see our website for more details: https://haian-jin.github.io/projects/LVSM/ .

Summary

AI-Generated Summary

PDF52November 16, 2024