We present a video decomposition method that facilitates layer-based editing of videos with spatiotemporally varying lighting and motion effects. Our neural model decomposes an input video into multiple layered representations, each comprising a 2D texture map, a mask for the original video, and a multiplicative residual characterizing the spatiotemporal variations in lighting conditions. A single edit on the texture maps can be propagated to the corresponding locations in the entire video frames while preserving other contents' consistencies. Our method efficiently learns the layer-based neural representations of a 1080p video in 25s per frame via coordinate hashing and allows real-time rendering of the edited result at 71 fps on a single GPU. Qualitatively, we run our method on various videos to show its effectiveness in generating high-quality editing effects. Quantitatively, we propose to adopt feature-tracking evaluation metrics for objectively assessing the consistency of video editing. Project page: https://lightbulb12294.github.io/hashing-nvd/
翻译:我们提出一种视频分解方法,支持对具有时空变化光照与运动效果的视频进行分层编辑。该神经模型将输入视频分解为多个分层表示,每个表示包含二维纹理图、原始视频掩码以及表征光照条件时空变化的乘法残差。对纹理图的单次编辑可传播至整段视频中所有对应位置,同时保持其他内容的一致性。该方法通过坐标哈希实现1080p视频每帧25秒的高效分层神经表示学习,并支持在单GPU上以71fps实时渲染编辑结果。定性实验中,我们在多种视频上验证了该方法生成高质量编辑效果的能力;定量方面,我们提出采用特征跟踪评估指标客观衡量视频编辑的一致性。项目页面:https://lightbulb12294.github.io/hashing-nvd/