VeGaS:影片高斯飛濺
VeGaS: Video Gaussian Splatting
November 17, 2024
作者: Weronika Smolak-Dyżewska, Dawid Malarz, Kornel Howil, Jan Kaczmarczyk, Marcin Mazur, Przemysław Spurek
cs.AI
摘要
隱式神經表示(INRs)利用神經網絡來近似將離散數據表示為連續函數。在視頻數據的情況下,這樣的模型可以用來將像素位置的坐標以及幀出現時間(或索引)轉換為 RGB 顏色值。儘管 INRs 有助於有效壓縮,但不適用於編輯目的。一個潛在的解決方案是使用基於 3D 高斯擴散(3DGS)的模型,如視頻高斯表示(VGR),它能夠將視頻編碼為眾多 3D 高斯分佈,並適用於眾多視頻處理操作,包括編輯。然而,在這種情況下,修改的能力僅限於有限的基本變換集。為解決這個問題,我們引入了視頻高斯擴散(VeGaS)模型,它能夠實現對視頻數據的逼真修改。為構建 VeGaS,我們提出了一個新型的折疊高斯分佈家族,旨在捕捉視頻流中的非線性動態,並通過獲取各自條件分佈的 2D 高斯分佈來對連續幀進行建模。我們的實驗表明,VeGaS 在幀重建任務中優於最先進的解決方案,並允許對視頻數據進行逼真修改。代碼可在以下鏈接找到:https://github.com/gmum/VeGaS。
English
Implicit Neural Representations (INRs) employ neural networks to approximate
discrete data as continuous functions. In the context of video data, such
models can be utilized to transform the coordinates of pixel locations along
with frame occurrence times (or indices) into RGB color values. Although INRs
facilitate effective compression, they are unsuitable for editing purposes. One
potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such
as the Video Gaussian Representation (VGR), which is capable of encoding video
as a multitude of 3D Gaussians and is applicable for numerous video processing
operations, including editing. Nevertheless, in this case, the capacity for
modification is constrained to a limited set of basic transformations. To
address this issue, we introduce the Video Gaussian Splatting (VeGaS) model,
which enables realistic modifications of video data. To construct VeGaS, we
propose a novel family of Folded-Gaussian distributions designed to capture
nonlinear dynamics in a video stream and model consecutive frames by 2D
Gaussians obtained as respective conditional distributions. Our experiments
demonstrate that VeGaS outperforms state-of-the-art solutions in frame
reconstruction tasks and allows realistic modifications of video data. The code
is available at: https://github.com/gmum/VeGaS.Summary
AI-Generated Summary