Weakly supervised person search aims to perform joint pedestrian detection and re-identification (re-id) with only person bounding-box annotations. Recently, the idea of contrastive learning is initially applied to weakly supervised person search, where two common contrast strategies are memory-based contrast and intra-image contrast. We argue that current intra-image contrast is shallow, which suffers from spatial-level and occlusion-level variance. In this paper, we present a novel deep intra-image contrastive learning using a Siamese network. Two key modules are spatial-invariant contrast (SIC) and occlusion-invariant contrast (OIC). SIC performs many-to-one contrasts between two branches of Siamese network and dense prediction contrasts in one branch of Siamese network. With these many-to-one and dense contrasts, SIC tends to learn discriminative scale-invariant and location-invariant features to solve spatial-level variance. OIC enhances feature consistency with the masking strategy to learn occlusion-invariant features. Extensive experiments are performed on two person search datasets CUHK-SYSU and PRW, respectively. Our method achieves a state-of-the-art performance among weakly supervised one-step person search approaches. We hope that our simple intra-image contrastive learning can provide more paradigms on weakly supervised person search. The source code is available at \url{https://github.com/jiabeiwangTJU/DICL}.
翻译:弱监督行人搜索旨在仅利用行人边界框标注进行联合行人检测与行人重识别。近期,对比学习思想被初步应用于弱监督行人搜索,其中两种常见对比策略是基于记忆体的对比和图像内对比。本文指出现有图像内对比方法较为浅层,难以应对空间尺度差异和遮挡差异。为克服这一局限,我们提出一种基于孪生网络的深度图像内对比学习框架,包含两个核心模块:空间不变对比(SIC)和遮挡不变对比(OIC)。SIC模块通过孪生网络两分支间的多对一对比以及单分支内的密集预测对比,学习具有尺度不变性和位置不变性的判别性特征,从而解决空间级差异问题。OIC模块采用掩码策略增强特征一致性,学习遮挡不变特征。在CUHK-SYSU和PRW两个行人搜索数据集上的大量实验表明,本方法在弱监督一步式行人搜索方法中取得了最优性能。我们希望这种简洁的图像内对比学习能为弱监督行人搜索提供新范式。源代码已开源至 \url{https://github.com/jiabeiwangTJU/DICL}。