Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dramatic view discrepancy as the most significant challenge in AGPReID, the view-decoupled transformer (VDT) is proposed as a simple yet effective framework. Two major components are designed in VDT to decouple view-related and view-unrelated features, namely hierarchical subtractive separation and orthogonal loss, where the former separates these two features inside the VDT, and the latter constrains these two to be independent. In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images. Experiments on two datasets show that VDT is a feasible and effective solution for AGPReID, surpassing the previous method on mAP/Rank1 by up to 5.0%/2.7% on CARGO and 3.7%/5.2% on AG-ReID, keeping the same magnitude of computational complexity. Our project is available at https://github.com/LinlyAC/VDT-AGPReID
翻译:现有行人重识别方法在同质摄像头(如地-地匹配)中基于外观的身份关联已取得显著进展。然而,作为更实际的应用场景,异质摄像头间的空地行人重识别(AGPReID)却鲜少受到关注。为缓解视角差异作为AGPReID中最显著的挑战对判别性身份表征的干扰,本文提出一种简洁而有效的框架——视角解耦Transformer(VDT)。VDT通过两个核心组件实现视角相关与视角无关特征的解耦:层次化减法分离模块与正交损失函数,前者在VDT内部完成两类特征的分离,后者约束两者保持独立性。此外,我们构建了名为CARGO的大规模AGPReID数据集,包含5个/8个空中/地面摄像头、5000个身份及108,563张图像。在两个数据集上的实验表明,VDT是AGPReID领域可行且有效的解决方案,在CARGO数据集上的mAP/Rank1指标分别提升5.0%/2.7%,在AG-ReID数据集上提升3.7%/5.2%,同时保持相同的计算复杂度。项目代码开源地址:https://github.com/LinlyAC/VDT-AGPReID