ChatPaper.aiChatPaper

穩健的雙高斯點陣投影技術用於身臨其境的以人為中心的體積視頻

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

September 12, 2024
作者: Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu
cs.AI

摘要

體積式影片代表了視覺媒體中的一項革命性進步,讓使用者可以自由地導航沉浸式虛擬體驗,縮小數位與現實世界之間的差距。然而,在現有工作流程中,需要大量手動干預來穩定網格序列以及生成過大的資產,這些因素阻礙了更廣泛的應用。本文提出了一種新穎的基於高斯的方法,名為DualGS,用於實時和高保真度地播放複雜人類表現,具有優異的壓縮比。DualGS 的關鍵思想是使用相應的皮膚和關節高斯來分別表示運動和外觀。這種明確的解耦可以顯著減少運動冗餘並增強時間上的一致性。我們首先初始化 DualGS,並在第一幀將皮膚高斯錨定到關節高斯。隨後,我們採用了一種逐幀人類表現建模的從粗到細的訓練策略。這包括一個用於整體運動預測的粗略對齊階段,以及用於強健追踪和高保真度渲染的精細優化。為了將體積式影片無縫整合到虛擬實境環境中,我們使用熵編碼有效壓縮運動,並使用編解碼壓縮以及持久的碼本來壓縮外觀。我們的方法實現了高達 120 倍的壓縮比,每幀僅需要約 350KB 的存儲空間。我們通過在虛擬實境頭戴設備上進行逼真的自由視角體驗,展示了我們表示法的有效性,使使用者可以沉浸式地觀看表演中的音樂家,感受表演者指尖的節奏。
English
Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed DualGS, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.

Summary

AI-Generated Summary

PDF134November 16, 2024