Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance. Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities, which still leave two problems underexplored: information redundancy and modality complementarity. To this end, properly eliminating the identity-irrelevant information as well as making up for the modality-specific information are critical and remains a challenging endeavor. To tackle the above problems, we present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features with the most representative information and reduce the redundancies. The key insight of our method is to find an optimal representation to capture more identity-relevant information and compress the irrelevant parts by optimizing a mutual information bottleneck trade-off. Besides, we propose an automatically search strategy to find the most prominent parts that identify the pedestrians. To eliminate the cross- and intra-modality variations, we also devise a modality consensus module to align the visible and infrared modalities for task-specific guidance. Moreover, the global-local feature representations can also be acquired for key parts discrimination. Experimental results on four benchmarks, i.e., SYSU-MM01, RegDB, Occluded-DukeMTMC, Occluded-REID, Partial-REID and Partial\_iLIDS dataset, have demonstrated the effectiveness of CMInfoNet.
翻译:可见光-红外行人重识别(VI-ReID)是智能视频监控中的一项重要且具有挑战性的任务。现有方法主要致力于学习共享特征空间以减小可见光与红外模态之间的差异,但仍存在两个未被充分探索的问题:信息冗余与模态互补性。为此,合理消除身份无关信息并补充模态特定信息至关重要,且仍是一项艰巨的挑战。针对上述问题,本文提出了一种新颖的互信息与模态共识网络(CMInfoNet),通过提取具有最具代表性信息的模态不变身份特征来减少冗余。该方法的核心思想是通过优化互信息瓶颈权衡,寻找最优表征来捕获更多身份相关信息并压缩无关部分。此外,我们提出了一种自动搜索策略,用于发现识别行人的最显著部分。为消除跨模态与模态内差异,我们还设计了模态共识模块,对齐可见光与红外模态以提供任务特定指导。同时,可获取全局-局部特征表示以增强关键部位判别能力。在SYSU-MM01、RegDB、Occluded-DukeMTMC、Occluded-REID、Partial-REID和Partial_iLIDS四个基准数据集上的实验结果表明了CMInfoNet的有效性。