MIMO (multiple input, multiple output) approaches are a recent trend in neural network architectures for video restoration problems, where each network evaluation produces multiple output frames. The video is split into non-overlapping stacks of frames that are processed independently, resulting in a very appealing trade-off between output quality and computational cost. In this work we focus on the low-latency setting by limiting the number of available future frames. We find that MIMO architectures suffer from problems that have received little attention so far, namely (1) the performance drops significantly due to the reduced temporal receptive field, particularly for frames at the borders of the stack, (2) there are strong temporal discontinuities at stack transitions which induce a step-wise motion artifact. We propose two simple solutions to alleviate these problems: recurrence across MIMO stacks to boost the output quality by implicitly increasing the temporal receptive field, and overlapping of the output stacks to smooth the temporal discontinuity at stack transitions. These modifications can be applied to any MIMO architecture. We test them on three state-of-the-art video denoising networks with different computational cost. The proposed contributions result in a new state-of-the-art for low-latency networks, both in terms of reconstruction error and temporal consistency. As an additional contribution, we introduce a new benchmark consisting of drone footage that highlights temporal consistency issues that are not apparent in the standard benchmarks.
翻译:MIMO(多输入多输出)方法是视频复原问题中神经网络架构的新兴趋势,其每次网络评估可生成多帧输出。该方法将视频分割为互不重叠的帧堆栈进行独立处理,从而在输出质量与计算成本间实现了极具吸引力的权衡。本研究聚焦于通过限制可用未来帧数量来实现低延迟设置。我们发现MIMO架构存在两个迄今未受充分关注的问题:(1)因时间感受野缩减导致的性能显著下降,尤以堆栈边界帧为甚;(2)堆栈切换处存在强烈的时间不连续性,引发阶梯式运动伪影。为缓解这些问题,我们提出两种简易解决方案:通过MIMO堆栈间的循环处理隐式扩展时间感受野以提升输出质量,以及采用输出堆栈重叠技术平滑堆栈切换时的时间不连续性。这些改进可应用于任何MIMO架构。我们在三种具有不同计算成本的先进视频去噪网络上进行测试,所提方案在重建误差与时间一致性方面均实现了低延迟网络的新最优性能。作为附加贡献,我们引入了由无人机航拍素材构成的新基准测试集,该数据集能凸显标准基准测试中未显现的时间一致性问题。