VeGaS:视频高斯飞溅
VeGaS: Video Gaussian Splatting
November 17, 2024
作者: Weronika Smolak-Dyżewska, Dawid Malarz, Kornel Howil, Jan Kaczmarczyk, Marcin Mazur, Przemysław Spurek
cs.AI
摘要
隐式神经表示(INRs)利用神经网络来近似将离散数据表示为连续函数。在视频数据的背景下,这种模型可以用于将像素位置的坐标以及帧出现时间(或索引)转换为RGB颜色值。尽管INRs有助于有效压缩,但不适用于编辑目的。一个潜在的解决方案是使用基于3D高斯喷洒(3DGS)的模型,如视频高斯表示(VGR),它能够将视频编码为多个3D高斯,并适用于多种视频处理操作,包括编辑。然而,在这种情况下,修改的能力受限于一组有限的基本转换。为解决这一问题,我们引入了视频高斯喷洒(VeGaS)模型,它可以实现对视频数据的逼真修改。为构建VeGaS,我们提出了一种新颖的折叠高斯分布家族,旨在捕捉视频流中的非线性动态,并通过获得的2D高斯作为相应条件分布来对连续帧进行建模。我们的实验表明,VeGaS在帧重建任务中优于最先进的解决方案,并允许对视频数据进行逼真修改。代码可在以下链接找到:https://github.com/gmum/VeGaS。
English
Implicit Neural Representations (INRs) employ neural networks to approximate
discrete data as continuous functions. In the context of video data, such
models can be utilized to transform the coordinates of pixel locations along
with frame occurrence times (or indices) into RGB color values. Although INRs
facilitate effective compression, they are unsuitable for editing purposes. One
potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such
as the Video Gaussian Representation (VGR), which is capable of encoding video
as a multitude of 3D Gaussians and is applicable for numerous video processing
operations, including editing. Nevertheless, in this case, the capacity for
modification is constrained to a limited set of basic transformations. To
address this issue, we introduce the Video Gaussian Splatting (VeGaS) model,
which enables realistic modifications of video data. To construct VeGaS, we
propose a novel family of Folded-Gaussian distributions designed to capture
nonlinear dynamics in a video stream and model consecutive frames by 2D
Gaussians obtained as respective conditional distributions. Our experiments
demonstrate that VeGaS outperforms state-of-the-art solutions in frame
reconstruction tasks and allows realistic modifications of video data. The code
is available at: https://github.com/gmum/VeGaS.Summary
AI-Generated Summary