Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computational costs. In this study, we propose a simple yet effective framework for video restoration. Our approach is based on grouped spatial-temporal shift, which is a lightweight and straightforward technique that can implicitly capture inter-frame correspondences for multi-frame aggregation. By introducing grouped spatial shift, we attain expansive effective receptive fields. Combined with basic 2D convolution, this simple framework can effectively aggregate inter-frame information. Extensive experiments demonstrate that our framework outperforms the previous state-of-the-art method, while using less than a quarter of its computational cost, on both video deblurring and video denoising tasks. These results indicate the potential for our approach to significantly reduce computational overhead while maintaining high-quality results. Code is avaliable at https://github.com/dasongli1/Shift-Net.
翻译:视频复原旨在从退化视频中恢复清晰帧,具有众多重要应用。其关键在于利用帧间信息。然而,现有深度学习方法常依赖复杂的网络架构(如光流估计、可变形卷积和跨帧自注意力层),导致计算成本高昂。本研究提出了一种简单而有效的视频复原框架。该方法基于分组时空移位(grouped spatial-temporal shift),这是一种轻量且直接的技术,能够隐式捕获帧间对应关系以实现多帧聚合。通过引入分组空间移位,我们获得了广阔的有效感受野。结合基础二维卷积,这一简单框架可有效聚合帧间信息。大量实验表明,在视频去模糊和视频去噪任务中,该框架在计算成本不到先前最先进方法四分之一的情况下,性能超越后者。这些结果表明,我们的方法在保持高质量结果的同时,具有显著降低计算开销的潜力。代码见 https://github.com/dasongli1/Shift-Net。