Visible-infrared person re-identification (VI-ReID) is a challenging task due to large cross-modality discrepancies and intra-class variations. Existing methods mainly focus on learning modality-shared representations by embedding different modalities into the same feature space. As a result, the learned feature emphasizes the common patterns across modalities while suppressing modality-specific and identity-aware information that is valuable for Re-ID. To address these issues, we propose a novel Modality Unifying Network (MUN) to explore a robust auxiliary modality for VI-ReID. First, the auxiliary modality is generated by combining the proposed cross-modality learner and intra-modality learner, which can dynamically model the modality-specific and modality-shared representations to alleviate both cross-modality and intra-modality variations. Second, by aligning identity centres across the three modalities, an identity alignment loss function is proposed to discover the discriminative feature representations. Third, a modality alignment loss is introduced to consistently reduce the distribution distance of visible and infrared images by modality prototype modeling. Extensive experiments on multiple public datasets demonstrate that the proposed method surpasses the current state-of-the-art methods by a significant margin.
翻译:可见光-红外行人重识别(VI-ReID)因存在显著的跨模态差异与类内变化而极具挑战性。现有方法主要关注通过将不同模态嵌入同一特征空间来学习模态共享表征,但由此习得的特征强调模态间的共有模式,却抑制了对行人重识别有价值的模态特定信息与身份感知信息。为解决这些问题,我们提出一种新颖的模态统一网络(MUN),旨在探索用于VI-ReID的鲁棒辅助模态。首先,该辅助模态通过融合所提出的跨模态学习器与模态内学习器生成,可动态建模模态特定表征与模态共享表征,以缓解跨模态与模态内变化。其次,通过在三模态间对齐身份中心,提出一种身份对齐损失函数以发掘判别性特征表征。第三,引入模态对齐损失,通过模态原型建模持续减小可见光与红外图像的分布距离。在多个公开数据集上的大量实验表明,所提方法以显著优势超越当前最先进方法。