Since the differences in viewing range, resolution and relative position, the multi-modality sensing module composed of infrared and visible cameras needs to be registered so as to have more accurate scene perception. In practice, manual calibration-based registration is the most widely used process, and it is regularly calibrated to maintain accuracy, which is time-consuming and labor-intensive. To cope with these problems, we propose a scene-adaptive infrared and visible image registration. Specifically, in regard of the discrepancy between multi-modality images, an invertible translation process is developed to establish a modality-invariant domain, which comprehensively embraces the feature intensity and distribution of both infrared and visible modalities. We employ homography to simulate the deformation between different planes and develop a hierarchical framework to rectify the deformation inferred from the proposed latent representation in a coarse-to-fine manner. For that, the advanced perception ability coupled with the residual estimation conducive to the regression of sparse offsets, and the alternate correlation search facilitates a more accurate correspondence matching. Moreover, we propose the first ground truth available misaligned infrared and visible image dataset, involving three synthetic sets and one real-world set. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications.
翻译:由于红外与可见光摄像机组成的多模态传感模块在视场范围、分辨率和相对位置上的差异,需要进行配准以实现更精确的场景感知。在实际应用中,基于人工标定的配准是最广泛采用的方法,但需定期重新标定以保持精度,既耗时又费力。为解决这些问题,我们提出了一种场景自适应的红外与可见光图像配准方法。具体而言,针对多模态图像间的差异,我们开发了一种可逆转换过程以建立模态不变域,该域全面融合了红外和可见光两种模态的特征强度与分布。我们采用单应性矩阵模拟不同平面间的形变,并构建分层框架以由粗到精的方式校正从所提潜在表征中推断出的形变。为此,结合残差估计的高级感知能力有助于稀疏偏移的回归,而交替相关搜索则促进了更精确的对应匹配。此外,我们提出了首个具有真实标注的红外与可见光图像错位数据集,包含三个合成数据集和一个真实场景数据集。大量实验验证了所提方法相较于当前最优技术的有效性,推动了后续应用的发展。