Exploiting the Textual Potential from Vision-Language Pre-training for Text-based Person Search

Text-based Person Search (TPS), is targeted on retrieving pedestrians to match text descriptions instead of query images. Recent Vision-Language Pre-training (VLP) models can bring transferable knowledge to downstream TPS tasks, resulting in more efficient performance gains. However, existing TPS methods improved by VLP only utilize pre-trained visual encoders, neglecting the corresponding textual representation and breaking the significant modality alignment learned from large-scale pre-training. In this paper, we explore the full utilization of textual potential from VLP in TPS tasks. We build on the proposed VLP-TPS baseline model, which is the first TPS model with both pre-trained modalities. We propose the Multi-Integrity Description Constraints (MIDC) to enhance the robustness of the textual modality by incorporating different components of fine-grained corpus during training. Inspired by the prompt approach for zero-shot classification with VLP models, we propose the Dynamic Attribute Prompt (DAP) to provide a unified corpus of fine-grained attributes as language hints for the image modality. Extensive experiments show that our proposed TPS framework achieves state-of-the-art performance, exceeding the previous best method by a margin.

翻译：基于文本的行人搜索（TPS）旨在检索与文本描述匹配的行人，而非使用查询图像。最近的视觉-语言预训练（VLP）模型能够为下游TPS任务带来可迁移的知识，从而更有效地提升性能。然而，现有借助VLP改进的TPS方法仅利用了预训练的视觉编码器，忽视了相应的文本表示，破坏了从大规模预训练中学习到的关键模态对齐。本文探索了在TPS任务中充分利用VLP中的文本潜力。我们基于所提出的VLP-TPS基线模型构建，该模型是首个同时包含两种预训练模态的TPS模型。我们提出了多完整性描述约束（MIDC），通过在训练中融入细粒度语料库的不同成分来增强文本模态的鲁棒性。受VLP模型在零样本分类中采用提示方法的启发，我们提出了动态属性提示（DAP），为图像模态提供统一的细粒度属性语言提示。大量实验表明，我们提出的TPS框架达到了最先进性能，显著优于此前最佳方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日