VeGaS:视频高斯飞溅

VeGaS: Video Gaussian Splatting

November 17, 2024
作者: Weronika Smolak-Dyżewska, Dawid Malarz, Kornel Howil, Jan Kaczmarczyk, Marcin Mazur, Przemysław Spurek
cs.AI

摘要

隐式神经表示(INRs)利用神经网络来近似将离散数据表示为连续函数。在视频数据的背景下,这种模型可以用于将像素位置的坐标以及帧出现时间(或索引)转换为RGB颜色值。尽管INRs有助于有效压缩,但不适用于编辑目的。一个潜在的解决方案是使用基于3D高斯喷洒(3DGS)的模型,如视频高斯表示(VGR),它能够将视频编码为多个3D高斯,并适用于多种视频处理操作,包括编辑。然而,在这种情况下,修改的能力受限于一组有限的基本转换。为解决这一问题,我们引入了视频高斯喷洒(VeGaS)模型,它可以实现对视频数据的逼真修改。为构建VeGaS,我们提出了一种新颖的折叠高斯分布家族,旨在捕捉视频流中的非线性动态,并通过获得的2D高斯作为相应条件分布来对连续帧进行建模。我们的实验表明,VeGaS在帧重建任务中优于最先进的解决方案,并允许对视频数据进行逼真修改。代码可在以下链接找到:https://github.com/gmum/VeGaS。
English
Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data. The code is available at: https://github.com/gmum/VeGaS.

Summary

AI-Generated Summary

PDF62November 19, 2024