Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening

Under the flourishing development in performance, current image-text retrieval methods suffer from $N$-related time complexity, which hinders their application in practice. Targeting at efficiency improvement, this paper presents a simple and effective keyword-guided pre-screening framework for the image-text retrieval. Specifically, we convert the image and text data into the keywords and perform the keyword matching across modalities to exclude a large number of irrelevant gallery samples prior to the retrieval network. For the keyword prediction, we transfer it into a multi-label classification problem and propose a multi-task learning scheme by appending the multi-label classifiers to the image-text retrieval network to achieve a lightweight and high-performance keyword prediction. For the keyword matching, we introduce the inverted index in the search engine and create a win-win situation on both time and space complexities for the pre-screening. Extensive experiments on two widely-used datasets, i.e., Flickr30K and MS-COCO, verify the effectiveness of the proposed framework. The proposed framework equipped with only two embedding layers achieves $O(1)$ querying time complexity, while improving the retrieval efficiency and keeping its performance, when applied prior to the common image-text retrieval methods. Our code will be released.

翻译：在性能蓬勃发展的背景下，当前图像-文本检索方法受限于N相关的时间复杂度，阻碍了其实际应用。针对效率提升问题，本文提出了一种简单有效的关键词引导预筛选框架用于图像-文本检索。具体而言，我们将图像和文本数据转化为关键词，并在检索网络之前通过跨模态关键词匹配排除大量无关的候选样本。针对关键词预测，我们将其转化为多标签分类问题，并提出了一种多任务学习方案，通过为图像-文本检索网络附加多标签分类器来实现轻量级高性能的关键词预测。针对关键词匹配，我们引入搜索引擎中的倒排索引，在预筛选阶段实现了时间和空间复杂度的双赢。在两个广泛使用的数据集（Flickr30K和MS-COCO）上进行的大量实验验证了所提框架的有效性。该框架仅配备两个嵌入层即可实现O(1)的查询时间复杂度，同时当应用于常见图像-文本检索方法之前时，能够提升检索效率并保持其性能。我们的代码将对外发布。

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日