Visible-infrared person re-identification (VI-ReID) is a challenging task due to large cross-modality discrepancies and intra-class variations. Existing methods mainly focus on learning modality-shared representations by embedding different modalities into the same feature space. As a result, the learned feature emphasizes the common patterns across modalities while suppressing modality-specific and identity-aware information that is valuable for Re-ID. To address these issues, we propose a novel Modality Unifying Network (MUN) to explore a robust auxiliary modality for VI-ReID. First, the auxiliary modality is generated by combining the proposed cross-modality learner and intra-modality learner, which can dynamically model the modality-specific and modality-shared representations to alleviate both cross-modality and intra-modality variations. Second, by aligning identity centres across the three modalities, an identity alignment loss function is proposed to discover the discriminative feature representations. Third, a modality alignment loss is introduced to consistently reduce the distribution distance of visible and infrared images by modality prototype modeling. Extensive experiments on multiple public datasets demonstrate that the proposed method surpasses the current state-of-the-art methods by a significant margin.
翻译:可见光-红外行人重识别是一项具有挑战性的任务,主要由于跨模态差异大和类内变化显著。现有方法主要侧重于通过将不同模态嵌入到同一特征空间中来学习模态共享表征。然而,这样学习到的特征强调模态间的共性模式,却抑制了对行人重识别有价值的模态特定和身份感知信息。为解决这一问题,我们提出一种新颖的模态统一网络,用于探索针对可见光-红外行人重识别的鲁棒辅助模态。首先,通过结合所提出的跨模态学习器和模态内学习器生成辅助模态,能够动态建模模态特定与模态共享表征,从而缓解跨模态和模态内变化。其次,通过在三模态间对齐身份中心,提出一种身份对齐损失函数以发现判别性特征表征。再者,引入一种模态对齐损失,通过模态原型建模持续降低可见光和红外图像的分布距离。在多个公开数据集上的大量实验表明,所提方法以显著优势超越了当前最先进的方法。