MMOT：首个面向无人机多光谱多目标跟踪的挑战性基准 (MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking)

Drone-based multi-object tracking is essential yet highly challenging due to small targets, severe occlusions, and cluttered backgrounds. Existing RGB-based tracking algorithms heavily depend on spatial appearance cues such as color and texture, which often degrade in aerial views, compromising reliability. Multispectral imagery, capturing pixel-level spectral reflectance, provides crucial cues that enhance object discriminability under degraded spatial conditions. However, the lack of dedicated multispectral UAV datasets has hindered progress in this domain. To bridge this gap, we introduce MMOT, the first challenging benchmark for drone-based multispectral multi-object tracking. It features three key characteristics: (i) Large Scale - 125 video sequences with over 488.8K annotations across eight categories; (ii) Comprehensive Challenges - covering diverse conditions such as extreme small targets, high-density scenarios, severe occlusions, and complex motion; and (iii) Precise Oriented Annotations - enabling accurate localization and reduced ambiguity under aerial perspectives. To better extract spectral features and leverage oriented annotations, we further present a multispectral and orientation-aware MOT scheme adapting existing methods, featuring: (i) a lightweight Spectral 3D-Stem integrating spectral features while preserving compatibility with RGB pretraining; (ii) an orientation-aware Kalman filter for precise state estimation; and (iii) an end-to-end orientation-adaptive transformer. Extensive experiments across representative trackers consistently show that multispectral input markedly improves tracking performance over RGB baselines, particularly for small and densely packed objects. We believe our work will advance drone-based multispectral multi-object tracking research. Our MMOT, code, and benchmarks are publicly available at https://github.com/Annzstbl/MMOT.

翻译：基于无人机的多目标跟踪至关重要，但由于目标尺寸小、遮挡严重以及背景杂乱，该任务极具挑战性。现有的基于RGB的跟踪算法严重依赖于颜色和纹理等空间外观线索，这些线索在航拍视角下常常退化，从而影响可靠性。多光谱图像能够捕获像素级的光谱反射率，在空间条件退化的情况下，提供了增强目标可区分性的关键线索。然而，由于缺乏专门的多光谱无人机数据集，该领域的研究进展受到阻碍。为填补这一空白，我们提出了MMOT，这是首个面向无人机多光谱多目标跟踪的挑战性基准。它具有三个关键特征：(i) 大规模——包含125个视频序列，涵盖八个类别，标注总数超过48.88万；(ii) 综合挑战性——覆盖极端小目标、高密度场景、严重遮挡和复杂运动等多种条件；(iii) 精确的方向标注——能够在航拍视角下实现精确定位并减少歧义。为了更好地提取光谱特征并利用方向标注，我们进一步提出了一种适应现有方法的多光谱与方向感知MOT方案，其特点包括：(i) 一个轻量级的Spectral 3D-Stem，用于集成光谱特征，同时保持与RGB预训练的兼容性；(ii) 一个方向感知卡尔曼滤波器，用于精确的状态估计；(iii) 一个端到端的方向自适应Transformer。在多种代表性跟踪器上进行的大量实验一致表明，与RGB基线相比，多光谱输入显著提升了跟踪性能，特别是对于小型和密集排列的物体。我们相信，我们的工作将推动基于无人机的多光谱多目标跟踪研究的发展。我们的MMOT数据集、代码和基准测试已公开在 https://github.com/Annzstbl/MMOT。