Implicit Neural Representation for Videos (NeRV) has introduced a novel paradigm for video representation and compression, outperforming traditional codecs. As model size grows, however, slow encoding and decoding speed and high memory consumption hinder its application in practice. To address these limitations, we propose a new video representation and compression method based on 2D Gaussian Splatting to efficiently handle video data. Our proposed deformable 2D Gaussian Splatting dynamically adapts the transformation of 2D Gaussians at each frame, significantly reducing memory cost. Equipped with a multi-plane-based spatiotemporal encoder and a lightweight decoder, it predicts changes in color, coordinates, and shape of initialized Gaussians, given the time step. By leveraging temporal gradients, our model effectively captures temporal redundancy at negligible cost, significantly enhancing video representation efficiency. Our method reduces GPU memory usage by up to 78.4%, and significantly expedites video processing, achieving 5.5x faster training and 12.5x faster decoding compared to the state-of-the-art NeRV methods.
翻译:视频隐式神经表征(NeRV)为视频表征与压缩引入了新范式,其性能超越了传统编解码器。然而,随着模型规模增大,缓慢的编码与解码速度以及高内存消耗阻碍了其实际应用。为应对这些限制,我们提出了一种基于二维高斯泼溅的新型视频表征与压缩方法,以高效处理视频数据。我们提出的可变形二维高斯泼溅方法动态调整每一帧中二维高斯的变换,显著降低了内存开销。该方法配备了一个基于多平面的时空编码器和一个轻量级解码器,在给定时间步长的条件下,可预测初始化高斯的颜色、坐标与形状变化。通过利用时间梯度,我们的模型以可忽略的成本有效捕获了时间冗余,显著提升了视频表征效率。我们的方法将GPU内存使用量降低了高达78.4%,并显著加快了视频处理速度,与最先进的NeRV方法相比,实现了5.5倍的训练加速和12.5倍的解码加速。