NVComposer:透過多種稀疏且未對齊的影像增強生成式新視角合成
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
December 4, 2024
作者: Lingen Li, Zhaoyang Zhang, Yaowei Li, Jiale Xu, Xiaoyu Li, Wenbo Hu, Weihao Cheng, Jinwei Gu, Tianfan Xue, Ying Shan
cs.AI
摘要
最近生成模型的進步顯著提升了從多視角數據進行新視角合成(NVS)的能力。然而,現有方法依賴外部多視角對齊過程,如顯式姿勢估計或預重建,這限制了它們的靈活性和可訪問性,特別是當由於視角之間的重疊不足或遮蔽而導致對齊不穩定時。在本文中,我們提出了一種名為NVComposer的新方法,它消除了對明確外部對齊的需求。NVComposer通過引入兩個關鍵組件使生成模型能夠隱式推斷多個條件視角之間的空間和幾何關係:1)圖像-姿勢雙流擴散模型,同時生成目標新視角和條件相機姿勢,以及2)幾何感知特徵對齊模塊,在訓練過程中從密集立體模型中提煉幾何先驗。大量實驗表明,NVComposer在生成多視角NVS任務中實現了最先進的性能,消除了對外部對齊的依賴,從而提高了模型的可訪問性。我們的方法在合成質量方面顯示出顯著的改進,隨著未定位輸入視角數量的增加,突顯了其對更靈活和可訪問的生成NVS系統的潛力。
English
Recent advancements in generative models have significantly improved novel
view synthesis (NVS) from multi-view data. However, existing methods depend on
external multi-view alignment processes, such as explicit pose estimation or
pre-reconstruction, which limits their flexibility and accessibility,
especially when alignment is unstable due to insufficient overlap or occlusions
between views. In this paper, we propose NVComposer, a novel approach that
eliminates the need for explicit external alignment. NVComposer enables the
generative model to implicitly infer spatial and geometric relationships
between multiple conditional views by introducing two key components: 1) an
image-pose dual-stream diffusion model that simultaneously generates target
novel views and condition camera poses, and 2) a geometry-aware feature
alignment module that distills geometric priors from dense stereo models during
training. Extensive experiments demonstrate that NVComposer achieves
state-of-the-art performance in generative multi-view NVS tasks, removing the
reliance on external alignment and thus improving model accessibility. Our
approach shows substantial improvements in synthesis quality as the number of
unposed input views increases, highlighting its potential for more flexible and
accessible generative NVS systems.Summary
AI-Generated Summary