We consider the challenging task of training models for image-to-video deblurring, which aims to recover a sequence of sharp images corresponding to a given blurry image input. A critical issue disturbing the training of an image-to-video model is the ambiguity of the frame ordering since both the forward and backward sequences are plausible solutions. This paper proposes an effective self-supervised ordering scheme that allows training high-quality image-to-video deblurring models. Unlike previous methods that rely on order-invariant losses, we assign an explicit order for each video sequence, thus avoiding the order-ambiguity issue. Specifically, we map each video sequence to a vector in a latent high-dimensional space so that there exists a hyperplane such that for every video sequence, the vectors extracted from it and its reversed sequence are on different sides of the hyperplane. The side of the vectors will be used to define the order of the corresponding sequence. Last but not least, we propose a real-image dataset for the image-to-video deblurring problem that covers a variety of popular domains, including face, hand, and street. Extensive experimental results confirm the effectiveness of our method. Code and data are available at https://github.com/VinAIResearch/HyperCUT.git
翻译:我们研究了图像到视频去模糊模型训练这一具有挑战性的任务,其目标是从给定的模糊图像输入中恢复一序列清晰图像。干扰图像到视频模型训练的关键问题在于帧序列的歧义性,因为正向和反向序列都是合理的解。本文提出了一种有效的自监督排序方案,使能够训练高质量的图像到视频去模糊模型。不同于以往依赖于顺序不变损失的方法,我们为每个视频序列分配显式顺序,从而避免序列歧义问题。具体而言,我们将每个视频序列映射到高维潜空间中的一个向量,使得存在一个超平面,对于每个视频序列,从该序列及其逆序序列中提取的向量位于超平面的不同侧。向量的侧别将被用于定义对应序列的顺序。最后但同样重要的是,我们为图像到视频去模糊问题构建了一个涵盖人脸、手部和街道等多种常见领域的真实图像数据集。大量实验结果证实了我们方法的有效性。代码和数据可在 https://github.com/VinAIResearch/HyperCUT.git 获取。