Large-scale pre-training has proven to be an effective method for improving performance across different tasks. Current person search methods use ImageNet pre-trained models for feature extraction, yet it is not an optimal solution due to the gap between the pre-training task and person search task (as a downstream task). Therefore, in this paper, we focus on pre-training for person search, which involves detecting and re-identifying individuals simultaneously. Although labeled data for person search is scarce, datasets for two sub-tasks person detection and re-identification are relatively abundant. To this end, we propose a hybrid pre-training framework specifically designed for person search using sub-task data only. It consists of a hybrid learning paradigm that handles data with different kinds of supervisions, and an intra-task alignment module that alleviates domain discrepancy under limited resources. To the best of our knowledge, this is the first work that investigates how to support full-task pre-training using sub-task data. Extensive experiments demonstrate that our pre-trained model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone. For example, our model improves ResNet50 based NAE by 10.3% relative improvement w.r.t. mAP. Our code and pre-trained models are released for plug-and-play usage to the person search community.
翻译:大规模预训练已被证明是提升不同任务性能的有效方法。当前行人搜索方法使用ImageNet预训练模型进行特征提取,但由于预训练任务与行人搜索任务(作为下游任务)之间存在差距,这并非最优解决方案。因此,本文聚焦于行人搜索的预训练,该任务需要同时进行行人检测与重识别。尽管行人搜索的标注数据稀缺,但其两个子任务(行人检测与重识别)的数据集相对丰富。为此,我们提出了一种专门针对行人搜索的混合预训练框架,该框架仅使用子任务数据。它包含一种混合学习范式,用于处理具有不同监督类型的数据,以及一个任务内对齐模块,在资源有限的情况下缓解领域差异。据我们所知,这是首项探索如何利用子任务数据支持全任务预训练的研究。大量实验表明,我们的预训练模型能够在不同协议(如行人搜索方法、微调数据、预训练数据及模型骨干网络)下取得显著提升。例如,基于ResNet50的NAE方法在mAP指标上实现了10.3%的相对提升。我们已开源代码与预训练模型,供行人搜索社区即插即用。