Image alignment and image restoration are classical computer vision tasks. However, there is still a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models. Obtaining ground-truth data for image alignment requires sophisticated structure-from-motion methods or optical flow systems that often do not provide enough data variance, i.e., typically providing a high number of image correspondences, while only introducing few changes of scenery within the underlying image sequences. Alternative approaches utilize random perspective distortions on existing image data. However, this only provides trivial distortions, lacking the complexity and variance of real-world scenarios. Instead, our proposed data augmentation helps to overcome the issue of data scarcity by using 3D rendering: images are added as textures onto a plane, then varying lighting conditions, shadows, and occlusions are added to the scene. The scene is rendered from multiple viewpoints, generating perspective distortions more consistent with real-world scenarios, with homographies closely resembling those of camera projections rather than randomized homographies. For each scene, we provide a sequence of distorted images with corresponding occlusion masks, homographies, and ground-truth labels. The resulting dataset can serve as a training and evaluation set for a multitude of tasks involving image alignment and artifact removal, such as deep homography estimation, dense image matching, 2D bundle adjustment, inpainting, shadow removal, denoising, content retrieval, and background subtraction. Our data generation pipeline is customizable and can be applied to any existing dataset, serving as a data augmentation to further improve the feature learning of any existing method.
翻译:摘要:图像对齐与图像恢复是经典的计算机视觉任务。然而,目前仍缺乏能够提供足够数据以训练和评估端到端深度学习模型的数据集。获取图像对齐的真值数据需要复杂的运动恢复结构方法或光流系统,这些方法通常难以提供足够的数据多样性——例如,虽然能产生大量图像对应点,但在底层图像序列中场景变化极少。替代方法利用现有图像数据的随机透视畸变,但这仅能生成简单的畸变,缺乏真实场景的复杂性与多样性。为此,我们提出的数据增强方法通过三维渲染克服数据稀缺问题:将图像作为纹理添加至平面,然后向场景中叠加变化的照明条件、阴影和遮挡。从多个视角渲染场景,生成更符合真实场景的透视畸变,其单应性矩阵与相机投影高度相似,而非随机化单应性。针对每个场景,我们提供一系列畸变图像及其对应的遮挡掩码、单应性矩阵和真值标签。生成的数据集可作为多种涉及图像对齐与伪影去除任务的训练与评估集合,包括深度单应性估计、密集图像匹配、二维光束法平差、图像修复、阴影去除、去噪、内容检索和背景减除。我们的数据生成流程可定制化,并能应用于任何现有数据集,作为数据增强手段进一步提升现有方法的特征学习能力。