This paper studies the challenging problem of recovering motion from blur, also known as joint deblurring and interpolation or blur temporal super-resolution. The challenges are twofold: 1) the current methods still leave considerable room for improvement in terms of visual quality even on the synthetic dataset, and 2) poor generalization to real-world data. To this end, we propose a blur interpolation transformer (BiT) to effectively unravel the underlying temporal correlation encoded in blur. Based on multi-scale residual Swin transformer blocks, we introduce dual-end temporal supervision and temporally symmetric ensembling strategies to generate effective features for time-varying motion rendering. In addition, we design a hybrid camera system to collect the first real-world dataset of one-to-many blur-sharp video pairs. Experimental results show that BiT has a significant gain over the state-of-the-art methods on the public dataset Adobe240. Besides, the proposed real-world dataset effectively helps the model generalize well to real blurry scenarios. Code and data are available at https://github.com/zzh-tech/BiT.
翻译:本文研究从模糊中恢复运动这一挑战性问题,即联合去模糊与插值或模糊时间超分辨率。其挑战性体现为两方面:1)当前方法即使在合成数据集上,视觉质量仍有较大提升空间;2)对真实世界数据的泛化能力较差。为此,我们提出一种模糊插值Transformer(BiT),以有效解码模糊中蕴含的潜在时间相关性。基于多尺度残差Swin Transformer模块,我们引入双端时间监督与时序对称集成策略,生成用于时变运动渲染的有效特征。此外,我们设计了一种混合相机系统,构建了首个真实世界的一对多模糊-清晰视频对数据集。实验结果表明,BiT在公开数据集Adobe240上显著优于现有最优方法。同时,所提出的真实世界数据集有效帮助模型在真实模糊场景中实现良好泛化。代码与数据已开源:https://github.com/zzh-tech/BiT。