Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims at learning modality-invariant features from unlabeled cross-modality dataset, which is crucial for practical applications in video surveillance systems. The key to essentially address the USL-VI-ReID task is to solve the cross-modality data association problem for further heterogeneous joint learning. To address this issue, we propose a Dual Optimal Transport Label Assignment (DOTLA) framework to simultaneously assign the generated labels from one modality to its counterpart modality. The proposed DOTLA mechanism formulates a mutual reinforcement and efficient solution to cross-modality data association, which could effectively reduce the side-effects of some insufficient and noisy label associations. Besides, we further propose a cross-modality neighbor consistency guided label refinement and regularization module, to eliminate the negative effects brought by the inaccurate supervised signals, under the assumption that the prediction or label distribution of each example should be similar to its nearest neighbors. Extensive experimental results on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method, surpassing existing state-of-the-art approach by a large margin of 7.76% mAP on average, which even surpasses some supervised VI-ReID methods.
翻译:无监督可见光-红外行人重识别(USL-VI-ReID)旨在从无标注的跨模态数据集中学习模态不变特征,这在视频监控系统的实际应用中至关重要。解决USL-VI-ReID任务的核心在于处理跨模态数据关联问题,以实现进一步的异质联合学习。针对该问题,我们提出了一种双最优传输标签分配(DOTLA)框架,可同时将一个模态生成的标签分配至另一模态。所提出的DOTLA机制构建了一种相互增强且高效的跨模态数据关联解决方案,能够有效减少某些不充分及含噪标签关联带来的负面影响。此外,我们进一步提出了一种跨模态邻域一致性引导的标签精化与正则化模块,基于每个样本的预测或标签分布应与其最近邻相似的假设,消除不准确监督信号带来的负面效应。在公开数据集SYSU-MM01和RegDB上的大量实验结果验证了所提方法的有效性,其平均mAP以7.76%的显著幅度超越现有最先进方法,甚至优于部分有监督VI-ReID方法。