This paper proposes a deep learning based solution for multi-modal image alignment regarding UAV-taken images. Many recently proposed state-of-the-art alignment techniques rely on using Lucas-Kanade (LK) based solutions for a successful alignment. However, we show that we can achieve state of the art results without using LK-based methods. Our approach carefully utilizes a two-branch based convolutional neural network (CNN) based on feature embedding blocks. We propose two variants of our approach, where in the first variant (ModelA), we directly predict the new coordinates of only the four corners of the image to be aligned; and in the second one (ModelB), we predict the homography matrix directly. Applying alignment on the image corners forces algorithm to match only those four corners as opposed to computing and matching many (key)points, since the latter may cause many outliers, yielding less accurate alignment. We test our proposed approach on four aerial datasets and obtain state of the art results, when compared to the existing recent deep LK-based architectures.
翻译:本文提出了一种基于深度学习的多模态图像配准方法,专门针对无人机拍摄的图像。近年来,许多最先进的配准技术依赖于基于Lucas-Kanade(LK)的方法来实现成功配准。然而,我们证明在不使用LK方法的情况下依然能取得最优结果。我们的方法精心设计了一种基于特征嵌入模块的双分支卷积神经网络(CNN)。我们提出了两种变体:第一种变体(ModelA)直接预测待配准图像四个角点的新坐标;第二种变体(ModelB)直接预测单应性矩阵。通过对图像角点进行配准,算法仅需匹配这四个角点,而非计算并匹配大量(关键)点——后者可能产生众多离群点,导致配准精度下降。我们在四个航空数据集上测试了所提方法,与现有基于深度LK架构的最新方案相比,取得了最优结果。