Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic changes in camera viewpoints. The core idea behind these methods is quite natural, but designing a view-robust model is a very challenging task. Moreover, they overlook the contribution of view-specific features in enhancing the model's ability to represent persons. To address these issues, we propose a novel generative framework named SD-ReID for AG-ReID, which leverages generative models to mimic the feature distribution of different views while extracting robust identity representations. More specifically, we first train a ViT-based model to extract person representations along with controllable conditions, including identity and view conditions. We then fine-tune the Stable Diffusion (SD) model to enhance person representations guided by these controllable conditions. Furthermore, we introduce the View-Refined Decoder (VRD) to bridge the gap between instance-level and global-level features. Finally, both person representations and all-view features are employed to retrieve target persons. Extensive experiments on five AG-ReID benchmarks (i.e., CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR and G2APS-ReID) demonstrate the effectiveness of our proposed method. The source code will be available.
翻译:空地行人重识别旨在跨不同视角的摄像头检索特定行人。先前的研究侧重于设计判别性模型以在相机视角剧烈变化时保持身份一致性。这些方法的核心思想较为直观,但设计一个视角鲁棒的模型极具挑战性。此外,它们忽视了视角特异性特征对增强模型行人表征能力的贡献。为解决这些问题,我们提出了一种名为SD-ReID的新型生成式框架,该框架利用生成模型模拟不同视角的特征分布,同时提取鲁棒的身份表征。具体而言,我们首先训练一个基于ViT的模型,以提取包含可控条件(包括身份条件和视角条件)的行人表征。随后,我们微调稳定扩散模型,在这些可控条件的引导下增强行人表征。此外,我们引入了视角精炼解码器,以弥合实例级特征与全局级特征之间的差距。最终,结合行人表征和全视角特征进行目标行人检索。在五个空地行人重识别基准数据集(即CARGO、AG-ReIDv1、AG-ReIDv2、LAGPeR和G2APS-ReID)上的大量实验验证了所提方法的有效性。源代码将公开提供。