Existing multiple modality fusion methods, such as concatenation, summation, and encoder-decoder-based fusion, have recently been employed to combine modality characteristics of Hyperspectral Image (HSI) and Light Detection And Ranging (LiDAR). However, these methods consider the relationship of HSI-LiDAR signals from limited perspectives. More specifically, they overlook the contextual information across modalities of HSI and LiDAR and the intra-modality characteristics of LiDAR. In this paper, we provide a new insight into feature fusion to explore the relationships across HSI and LiDAR modalities comprehensively. An Interconnected Fusion (IF) framework is proposed. Firstly, the center patch of the HSI input is extracted and replicated to the size of the HSI input. Then, nine different perspectives in the fusion matrix are generated by calculating self-attention and cross-attention among the replicated center patch, HSI input, and corresponding LiDAR input. In this way, the intra- and inter-modality characteristics can be fully exploited, and contextual information is considered in both intra-modality and inter-modality manner. These nine interrelated elements in the fusion matrix can complement each other and eliminate biases, which can generate a multi-modality representation for classification accurately. Extensive experiments have been conducted on three widely used datasets: Trento, MUUFL, and Houston. The IF framework achieves state-of-the-art results on these datasets compared to existing approaches.
翻译:现有的多模态融合方法(如拼接、求和及基于编码器-解码器的融合)已被用于结合高光谱图像(HSI)与激光雷达(LiDAR)的模态特征。然而,这些方法仅从有限视角考虑HSI-LiDAR信号之间的关系,具体忽视了HSI与LiDAR跨模态的上下文信息以及LiDAR的模态内特征。本文提出了一种新的特征融合视角,以全面探索HSI与LiDAR模态之间的关系,并由此构建了互联融合(IF)框架。首先,提取HSI输入的中心块并复制至与HSI输入相同尺寸;随后,通过计算复制中心块、HSI输入及对应LiDAR输入之间的自注意力与交叉注意力,生成融合矩阵中的九个不同视角。该方法可充分挖掘模态内与模态间的特征,并以模态内与模态间的方式同时考虑上下文信息。融合矩阵中这九个相互关联的元素可相互补充并消除偏差,从而精准生成用于分类的多模态表征。在Trento、MUUFL和Houston三个广泛使用的数据集上进行了大量实验。与现有方法相比,IF框架在这些数据集上取得了最先进的性能。