Video restoration task aims to recover high-quality videos from low-quality observations. This contains various important sub-tasks, such as video denoising, deblurring and low-light enhancement, since video often faces different types of degradation, such as blur, low light, and noise. Even worse, these kinds of degradation could happen simultaneously when taking videos in extreme environments. This poses significant challenges if one wants to remove these artifacts at the same time. In this paper, to the best of our knowledge, we are the first to propose an efficient end-to-end video transformer approach for the joint task of video deblurring, low-light enhancement, and denoising. This work builds a novel multi-tier transformer where each tier uses a different level of degraded video as a target to learn the features of video effectively. Moreover, we carefully design a new tier-to-tier feature fusion scheme to learn video features incrementally and accelerate the training process with a suitable adaptive weighting scheme. We also provide a new Multiscene-Lowlight-Blur-Noise (MLBN) dataset, which is generated according to the characteristics of the joint task based on the RealBlur dataset and YouTube videos to simulate realistic scenes as far as possible. We have conducted extensive experiments, compared with many previous state-of-the-art methods, to show the effectiveness of our approach clearly.
翻译:视频复原任务旨在从低质量观测中恢复高质量视频。该任务包含多个重要子任务,如视频去噪、去模糊和低光增强,因为视频常面临模糊、低光和噪声等不同类型的退化。更严峻的是,在极端环境下拍摄视频时,这些退化可能同时发生,为同时消除这些伪影带来了重大挑战。本文首次提出了一种高效端到端视频Transformer方法,用于同时完成视频去模糊、低光增强与去噪的联合任务。本工作构建了一个新型多层Transformer架构,其中每一层使用不同退化程度的视频作为目标,以有效学习视频特征。此外,我们精心设计了一种新的层间特征融合方案,通过逐步学习视频特征并采用自适应加权策略加速训练过程。我们还提供了一个新的多场景低光模糊噪声(MLBN)数据集,该数据集基于联合任务特性,以RealBlur数据集和YouTube视频为基础生成,旨在最大程度模拟真实场景。通过大量实验并与多种先前最先进方法对比,我们清晰验证了所提方法的有效性。