Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy. In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM). In particular, we observe that the reflective intensity of the same surface in photos shot in different wavelengths could be transformed using a linear model. Besides, we show the variable linear factor across the different surfaces is the main culprit which initiates the modality discrepancy. We integrate such a reflection observation into an image-level data augmentation by proposing the linear transformation generator (LTG). Moreover, at the feature level, we introduce a cross-center loss to explore a more compact intra-class distribution and modality-aware spatial attention to take advantage of textured regions more efficiently. Experiment results on two standard cross-spectral person re-identification datasets, i.e., RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.
翻译:跨光谱行人重识别旨在关联不同光谱下的行人身份,其面临的主要挑战是模态差异。本文通过端到端混合学习框架——鲁棒特征挖掘网络(RFM),从图像级和特征级两方面解决该问题。具体而言,我们观察到不同波长拍摄的照片中同一表面的反射强度可通过线性模型进行变换。此外,我们证明不同表面间线性因子的可变性是引发模态差异的主要原因。通过提出线性变换生成器(LTG),我们将这一反射观测结果融入图像级数据增强。在特征层面,我们引入跨中心损失以探索更紧凑的类内分布,并采用模态感知空间注意力机制更高效地利用纹理区域。在两个标准跨光谱行人重识别数据集(RegDB和SYSU-MM01)上的实验结果表明,该方法达到了最优性能。