Most of the research in content-based image retrieval (CBIR) focus on developing robust feature representations that can effectively retrieve instances from a database of images that are visually similar to a query. However, the retrieved images sometimes contain results that are not semantically related to the query. To address this, we propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy. The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification, assuming that overlapping classes share high visual and semantic similarities. Finally, the constructed hierarchy is integrated into the distance calculation metric for similarity search. Experiments on standard datasets: CUB-200-2011 and CIFAR100, and a real-life use case using diatom microscopy images show that our method achieves superior performance compared to the existing methods on image retrieval.
翻译:大多数基于内容的图像检索研究侧重于开发鲁棒的特征表示,以有效检索与查询图像在视觉上相似的数据库实例。然而,检索结果中有时会包含与查询语义不相关的图像。为解决这一问题,我们提出了一种利用视觉层级结构同时捕获视觉与语义相似性的CBIR方法。该层级通过合并分类训练深度神经网络潜在空间中特征重叠的类别构建,假设这些重叠类别具有高度的视觉与语义相似性。最后,将构建的层级结构集成到相似性搜索的距离度量中。在标准数据集CUB-200-2011、CIFAR100以及使用硅藻显微图像的实际应用场景中的实验表明,与现有图像检索方法相比,我们的方法取得了更优性能。