Visible-infrared person re-identification (VI-ReID) aims to retrieve images of the same pedestrian from different modalities, where the challenges lie in the significant modality discrepancy. To alleviate the modality gap, recent methods generate intermediate images by GANs, grayscaling, or mixup strategies. However, these methods could introduce extra data distribution, and the semantic correspondence between the two modalities is not well learned. In this paper, we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of the same person from two modalities are split into patches and stitched into a new one for model learning. A part-alignment loss is introduced to regularize representation learning, and a patch-mixed modality learning loss is proposed to align between the modalities. In this way, the model learns to recognize a person through patches of different styles, thereby the modality semantic correspondence can be inferred. In addition, with the flexible image generation strategy, the patch-mixed images freely adjust the ratio of different modality patches, which could further alleviate the modality imbalance problem. On two VI-ReID datasets, we report new state-of-the-art performance with the proposed method.
翻译:可见-红外行人重识别(VI-ReID)旨在检索不同模态下同一行人的图像,其核心挑战在于显著的模态差异。为缓解模态鸿沟,现有方法通过生成对抗网络、灰度化或混合策略生成中间图像。然而,这些方法可能引入额外数据分布,且未能充分学习两种模态间的语义对应关系。本文提出一种块混合跨模态框架(PMCM),将同一行人来自不同模态的两幅图像分割为图像块并拼接成新图像用于模型学习。我们引入局部对齐损失以约束表征学习,并设计块混合模态学习损失来实现模态间对齐。通过这种方式,模型能够通过不同风格的图像块识别行人,从而推断模态语义对应关系。此外,凭借灵活的图像生成策略,块混合图像可自由调节不同模态块的比例,进一步缓解模态不平衡问题。在两个VI-ReID数据集上,本文方法取得了新的最优性能。