Self-supervised video denoising aims to remove noise from videos without relying on ground truth data, leveraging the video itself to recover clean frames. Existing methods often rely on simplistic feature stacking or apply optical flow without thorough analysis. This results in suboptimal utilization of both inter-frame and intra-frame information, and it also neglects the potential of optical flow alignment under self-supervised conditions, leading to biased and insufficient denoising outcomes. To this end, we first explore the practicality of optical flow in the self-supervised setting and introduce a SpatioTemporal Blind-spot Network (STBN) for global frame feature utilization. In the temporal domain, we utilize bidirectional blind-spot feature propagation through the proposed blind-spot alignment block to ensure accurate temporal alignment and effectively capture long-range dependencies. In the spatial domain, we introduce the spatial receptive field expansion module, which enhances the receptive field and improves global perception capabilities. Additionally, to reduce the sensitivity of optical flow estimation to noise, we propose an unsupervised optical flow distillation mechanism that refines fine-grained inter-frame interactions during optical flow alignment. Our method demonstrates superior performance across both synthetic and real-world video denoising datasets. The source code is publicly available at https://github.com/ZKCCZ/STBN.
翻译:自监督视频去噪旨在不依赖真实标注数据的情况下消除视频中的噪声,利用视频自身信息恢复干净帧。现有方法通常依赖简单的特征堆叠,或未经深入分析直接应用光流。这导致帧间与帧内信息利用不足,同时忽视了自监督条件下光流对齐的潜力,从而产生有偏且不充分的去噪结果。为此,我们首先探究了光流在自监督设置中的实用性,并提出一种时空盲点网络(STBN)以实现全局帧特征利用。在时间维度上,我们通过提出的盲点对齐块进行双向盲点特征传播,确保精确的时间对齐并有效捕获长程依赖关系。在空间维度上,我们引入空间感受野扩展模块,以增强感受野并提升全局感知能力。此外,为降低光流估计对噪声的敏感性,我们提出一种无监督光流蒸馏机制,在光流对齐过程中细化细粒度的帧间交互作用。我们的方法在合成与真实世界视频去噪数据集上均展现出优越性能。源代码公开于 https://github.com/ZKCCZ/STBN。