As a promising field, Multi-Query Image Retrieval (MQIR) aims at searching for the semantically relevant image given multiple region-specific text queries. Existing works mainly focus on a single-level similarity between image regions and text queries, which neglects the hierarchical guidance of multi-level similarities and results in incomplete alignments. Besides, the high-level semantic correlations that intrinsically connect different region-query pairs are rarely considered. To address above limitations, we propose a novel Hierarchical Matching and Reasoning Network (HMRN) for MQIR. It disentangles MQIR into three hierarchical semantic representations, which is responsible to capture fine-grained local details, contextual global scopes, and high-level inherent correlations. HMRN comprises two modules: Scalar-based Matching (SM) module and Vector-based Reasoning (VR) module. Specifically, the SM module characterizes the multi-level alignment similarity, which consists of a fine-grained local-level similarity and a context-aware global-level similarity. Afterwards, the VR module is developed to excavate the potential semantic correlations among multiple region-query pairs, which further explores the high-level reasoning similarity. Finally, these three-level similarities are aggregated into a joint similarity space to form the ultimate similarity. Extensive experiments on the benchmark dataset demonstrate that our HMRN substantially surpasses the current state-of-the-art methods. For instance, compared with the existing best method Drill-down, the metric R@1 in the last round is improved by 23.4%. Our source codes will be released at https://github.com/LZH-053/HMRN.
翻译:作为一项富有前景的研究领域,多查询图像检索(MQIR)旨在根据多个区域特定的文本查询搜索语义相关的图像。现有工作主要关注图像区域与文本查询之间的单层级相似性,忽略了对多层级相似性的层级化指导,导致对齐不完整。此外,内在连接不同区域-查询对的高层语义相关性也鲜有考虑。为解决上述局限,我们提出了一种新颖的层级匹配与推理网络(HMRN)用于MQIR。该网络将MQIR解耦为三种层级的语义表示,分别负责捕捉细粒度的局部细节、上下文全局范围以及高层内在相关性。HMRN包含两个模块:基于标量的匹配(SM)模块和基于向量的推理(VR)模块。具体而言,SM模块刻画了多层级对齐相似性,包括细粒度的局部层级相似性和上下文感知的全局层级相似性。随后,VR模块被开发用于挖掘多个区域-查询对之间潜在的语义相关性,进一步探索高层推理相似性。最后,这三种层级的相似性被聚合到一个联合相似性空间中以形成最终相似性。在基准数据集上的大量实验表明,我们的HMRN大幅超越了当前最先进的方法。例如,与现有最优方法Drill-down相比,最后一轮评估中的R@1指标提升了23.4%。我们的源代码将在https://github.com/LZH-053/HMRN发布。