Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events

In large-scale disaster events, the planning of optimal rescue routes depends on the object detection ability at the disaster scene, with one of the main challenges being the presence of dense and occluded objects. Existing methods, which are typically based on the RGB modality, struggle to distinguish targets with similar colors and textures in crowded environments and are unable to identify obscured objects. To this end, we first construct two multimodal dense and occlusion vehicle detection datasets for large-scale events, utilizing RGB and height map modalities. Based on these datasets, we propose a multimodal collaboration network for dense and occluded vehicle detection, MuDet for short. MuDet hierarchically enhances the completeness of discriminable information within and across modalities and differentiates between simple and complex samples. MuDet includes three main modules: Unimodal Feature Hierarchical Enhancement (Uni-Enh), Multimodal Cross Learning (Mul-Lea), and Hard-easy Discriminative (He-Dis) Pattern. Uni-Enh and Mul-Lea enhance the features within each modality and facilitate the cross-integration of features from two heterogeneous modalities. He-Dis effectively separates densely occluded vehicle targets with significant intra-class differences and minimal inter-class differences by defining and thresholding confidence values, thereby suppressing the complex background. Experimental results on two re-labeled multimodal benchmark datasets, the 4K-SAI-LCS dataset, and the ISPRS Potsdam dataset, demonstrate the robustness and generalization of the MuDet. The codes of this work are available openly at \url{https://github.com/Shank2358/MuDet}.

翻译：在大规模灾害事件中，最优救援路径的规划依赖于灾害现场的目标检测能力，其中主要挑战之一是密集遮挡目标的存在。现有方法通常基于RGB模态，难以在拥挤环境中区分颜色和纹理相似的目标，且无法识别被遮挡物体。为此，我们首先利用RGB和高程图模态构建了两个面向大规模事件的多模态密集与遮挡车辆检测数据集。基于这些数据集，我们提出了一种面向密集与遮挡车辆检测的多模态协作网络（简称MuDet）。MuDet通过分层方式增强模态内与模态间可区分信息的完整性，并区分简单样本与复杂样本。该网络包含三个核心模块：单模态特征分层增强模块（Uni-Enh）、多模态交叉学习模块（Mul-Lea）及难易判别模式（He-Dis）。Uni-Enh和Mul-Lea分别增强各模态内部特征并促进两种异质模态特征的交叉融合。He-Dis通过定义并阈值化置信度值，有效分离类内差异大、类间差异小的密集遮挡车辆目标，从而抑制复杂背景。在两个重新标注的多模态基准数据集（4K-SAI-LCS数据集和ISPRS波茨坦数据集）上的实验结果表明，MuDet具有鲁棒性和泛化能力。本工作的代码已在\url{https://github.com/Shank2358/MuDet}开源。