VeGaS:影片高斯飛濺

VeGaS: Video Gaussian Splatting

November 17, 2024
作者: Weronika Smolak-Dyżewska, Dawid Malarz, Kornel Howil, Jan Kaczmarczyk, Marcin Mazur, Przemysław Spurek
cs.AI

摘要

隱式神經表示(INRs)利用神經網絡來近似將離散數據表示為連續函數。在視頻數據的情況下,這樣的模型可以用來將像素位置的坐標以及幀出現時間(或索引)轉換為 RGB 顏色值。儘管 INRs 有助於有效壓縮,但不適用於編輯目的。一個潛在的解決方案是使用基於 3D 高斯擴散(3DGS)的模型,如視頻高斯表示(VGR),它能夠將視頻編碼為眾多 3D 高斯分佈,並適用於眾多視頻處理操作,包括編輯。然而,在這種情況下,修改的能力僅限於有限的基本變換集。為解決這個問題,我們引入了視頻高斯擴散(VeGaS)模型,它能夠實現對視頻數據的逼真修改。為構建 VeGaS,我們提出了一個新型的折疊高斯分佈家族,旨在捕捉視頻流中的非線性動態,並通過獲取各自條件分佈的 2D 高斯分佈來對連續幀進行建模。我們的實驗表明,VeGaS 在幀重建任務中優於最先進的解決方案,並允許對視頻數據進行逼真修改。代碼可在以下鏈接找到:https://github.com/gmum/VeGaS。
English
Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data. The code is available at: https://github.com/gmum/VeGaS.

Summary

AI-Generated Summary

PDF62November 19, 2024