Searching for specific person has great security value and social benefits, and it often involves a combination of visual and textual information. Conventional person retrieval methods, whether image-based or text-based, usually fall short in effectively harnessing both types of information, leading to the loss of accuracy. In this paper, a whole new task called Composed Person Retrieval (CPR) is proposed to jointly utilize both image and text information for target person retrieval. However, the supervised CPR must depend on very costly manual annotation dataset, while there are currently no available resources. To mitigate this issue, we firstly introduce the Zero-shot Composed Person Retrieval (ZS-CPR), which leverages existing domain-related data to resolve the CPR problem without reliance on expensive annotations. Secondly, to learn ZS-CPR model, we propose a two-stage learning framework, Word4Per, where a lightweight Textual Inversion Network (TINet) and a text-based person retrieval model based on fine-tuned Contrastive Language-Image Pre-training (CLIP) network are learned without utilizing any CPR data. Thirdly, a finely annotated Image-Text Composed Person Retrieval dataset (ITCPR) is built as the benchmark to assess the performance of the proposed Word4Per framework. Extensive experiments under both Rank-1 and mAP demonstrate the effectiveness of Word4Per for the ZS-CPR task, surpassing the comparative methods by over 10%. The code and ITCPR dataset will be publicly available at https://github.com/Delong-liu-bupt/Word4Per.
翻译:搜索特定人物具有重要的安全价值和社会效益,且往往需要结合视觉与文本信息。传统的人物检索方法(无论是基于图像还是基于文本)通常难以有效利用这两种信息,导致准确率下降。本文提出一项全新任务——组合人物检索(CPR),旨在联合利用图像和文本信息进行目标人物检索。然而,监督式CPR必须依赖成本高昂的人工标注数据集,而目前尚无可用资源。为解决该问题,我们首先引入零样本组合人物检索(ZS-CPR),利用现有领域相关数据解决CPR问题,无需依赖昂贵标注。其次,为学习ZS-CPR模型,我们提出两阶段学习框架Word4Per,其中包含轻量级文本反转网络(TINet)和基于微调后的对比语言-图像预训练(CLIP)网络的文本人物检索模型,且均未使用任何CPR数据进行训练。再次,我们构建了精细标注的图像-文本组合人物检索数据集(ITCPR),作为评估所提Word4Per框架性能的基准。在Rank-1与mAP指标上的大量实验表明,Word4Per在ZS-CPR任务上有效性显著,超过对比方法10%以上。代码与ITCPR数据集将在https://github.com/Delong-liu-bupt/Word4Per 公开提供。