Text-Pedestrian Image Retrieval aims to use the text describing pedestrian appearance to retrieve the corresponding pedestrian image. This task involves not only modality discrepancy, but also the challenge of the textual diversity of pedestrians with the same identity. At present, although existing research progress has been made in text-pedestrian image retrieval, these methods do not comprehensively consider the above-mentioned problems. Considering these, this paper proposes a progressive feature mining and external knowledge-assisted feature purification method. Specifically, we use a progressive mining mode to enable the model to mine discriminative features from neglected information, thereby avoiding the loss of discriminative information and improving the expression ability of features. In addition, to further reduce the negative impact of modal discrepancy and text diversity on cross-modal matching, we propose to use other sample knowledge of the same modality, i.e., external knowledge to enhance identity-consistent features and weaken identity-inconsistent features. This process purifies features and alleviates the interference caused by textual diversity and negative sample correlation features of the same modal. Extensive experiments on three challenging datasets demonstrate the effectiveness and superiority of the proposed method, and the retrieval performance even surpasses that of the large-scale model-based method on large-scale datasets.
翻译:文本-行人图像检索旨在通过描述行人外貌的文本检索对应的行人图像。该任务不仅涉及模态差异,还面临同一身份行人的文本多样性挑战。目前,尽管现有研究在文本-行人图像检索方面取得了进展,但这些方法并未全面考虑上述问题。为此,本文提出一种渐进式特征挖掘与外部知识辅助的特征净化方法。具体而言,我们采用渐进式挖掘模式,使模型能够从被忽视的信息中挖掘判别性特征,从而避免判别性信息丢失并提升特征表达能力。此外,为进一步降低模态差异和文本多样性对跨模态匹配的负面影响,我们提出利用同模态的其他样本知识(即外部知识)增强身份一致性特征,弱化身份不一致性特征。该过程净化了特征,缓解了文本多样性及同模态负样本关联特征造成的干扰。在三个具有挑战性的数据集上进行的大量实验证明了所提方法的有效性和优越性,其检索性能甚至在大规模数据集上超越了基于大模型的方法。