Aerial-Ground Person Re-Identification (AGPReID) remains highly challenging due to drastic viewpoint variations between drones and fixed cameras. Existing methods typically follow a view-invariant paradigm, aligning shared features across views to achieve robustness. However, view-invariant inherently enforces part-level alignment, which ignores view-specific cues and discriminative identity information. To this end, this work proposes ViSA (View-aware Semantic Alignment), a view-aware framework that achieves cross-view semantic consistency containing an Expert-driven Token Generation Module (ETGM) and a Dual-branch Local Fusion Module (DLFM). Technically, the former constructs a set of view-aware experts to generate adaptive semantic queries that perceive viewpoint-specific patterns, while the latter leverages graph reasoning to extract and align local regions responsive to different experts. Extensive experiments on three AGPReID benchmarks including AG-ReID.v2, CARGO and LAGPeR demonstrate that ViSA consistently achieves superior performance, with a notable 10.06\% mAP improvement on the challenging CARGO cross-view protocol. The code is available at \href{https://github.com/Cat-Zero/ViSA}{https://github.com/Cat-Zero/ViSA}.
翻译:无人机-地面行人重识别(AGPReID)因无人机与固定摄像头之间的剧烈视角变化而极具挑战。现有方法通常遵循视角不变范式,通过对齐不同视角间的共享特征来实现鲁棒性。然而,视角不变方法本质上强制进行局部对齐,忽略了视角特定线索和判别性身份信息。为此,本文提出ViSA(面向视图的语义对齐),一种实现跨视图语义一致性的视角感知框架,包含专家驱动令牌生成模块(ETGM)和双分支局部融合模块(DLFM)。技术上,前者构建一组视角感知专家以生成感知视角特定模式的自适应语义查询,后者则利用图推理提取并对齐响应不同专家的局部区域。在包括AG-ReID.v2、CARGO和LAGPeR在内的三个AGPReID基准上的大量实验表明,ViSA始终取得优越性能,在极具挑战性的CARGO跨视图协议上实现了显著的10.06% mAP提升。代码已在\href{https://github.com/Cat-Zero/ViSA}{https://github.com/Cat-Zero/ViSA}开源。