Multimodal fusion detection always places high demands on the imaging system and image pre-processing, while either a high-quality pre-registration system or image registration processing is costly. Unfortunately, the existing fusion methods are designed for registered source images, and the fusion of inhomogeneous features, which denotes a pair of features at the same spatial location that expresses different semantic information, cannot achieve satisfactory performance via these methods. As a result, we propose IA-VFDnet, a CNN-Transformer hybrid learning framework with a unified high-quality multimodal feature matching module (AKM) and a fusion module (WDAF), in which AKM and DWDAF work in synergy to perform high-quality infrared-aware visible fusion detection, which can be applied to smoke and wildfire detection. Furthermore, experiments on the M3FD dataset validate the superiority of the proposed method, with IA-VFDnet achieving the best detection performance than other state-of-the-art methods under conventional registered conditions. In addition, the first unregistered multimodal smoke and wildfire detection benchmark is openly available in this letter.
翻译:多模态融合检测对成像系统及图像预处理始终提出高要求,而高质量预配准系统或图像配准处理成本高昂。现有融合方法专为配准源图像设计,难以对异质特征(即同一空间位置表达不同语义信息的特征对)实现满意融合效果。为此,我们提出IA-VFDnet——一种结合统一高质量多模态特征匹配模块(AKM)与融合模块(WDAF)的CNN-Transformer混合学习框架,其中AKM与DWDAF协同工作以实现高质量红外感知可见光融合检测,可应用于烟雾与野火检测。基于M3FD数据集的实验验证了所提方法的优越性:在传统配准条件下,IA-VFDnet较其他前沿方法展现出最优检测性能。此外,本论文首次公开了非配准多模态烟雾与野火检测基准数据集。