Deep hashing approaches, including deep quantization and deep binary hashing, have become a common solution to large-scale image retrieval due to their high computation and storage efficiency. Most existing hashing methods cannot produce satisfactory results for fine-grained retrieval, because they usually adopt the outputs of the last CNN layer to generate binary codes. Since deeper layers tend to summarize visual clues, e.g., texture, into abstract semantics, e.g., dogs and cats, the feature produced by the last CNN layer is less effective in capturing subtle but discriminative visual details that mostly exist in shallow layers. To improve fine-grained image hashing, we propose Pyramid Hybrid Pooling Quantization (PHPQ). Specifically, we propose a Pyramid Hybrid Pooling (PHP) module to capture and preserve fine-grained semantic information from multi-level features, which emphasizes the subtle discrimination of different sub-categories. Besides, we propose a learnable quantization module with a partial codebook attention mechanism, which helps to optimize the most relevant codewords and improves the quantization. Comprehensive experiments on two widely-used public benchmarks, i.e., CUB-200-2011 and Stanford Dogs, demonstrate that PHPQ outperforms state-of-the-art methods.
翻译:深度哈希方法(包括深度量化和深度二值哈希)因其高计算与存储效率,已成为大规模图像检索的通用解决方案。多数现有哈希方法通常采用最后一个卷积神经网络层的输出生成二值码,因此难以在细粒度检索中取得令人满意的结果。由于深层倾向于将纹理等视觉线索抽象为猫狗等语义概念,最后一个卷积神经网络层产生的特征在捕捉主要存在于浅层的细微但具有判别性的视觉细节方面效果不佳。为提升细粒度图像哈希性能,我们提出金字塔混合池化量化方法(PHPQ)。具体而言,我们提出金字塔混合池化(PHP)模块,从多层级特征中捕获并保留细粒度语义信息,强调不同子类别间的细微判别性。此外,我们提出带有部分码本注意力机制的可学习量化模块,该模块有助于优化最相关的码字并提升量化效果。在两个广泛使用的公开基准数据集(CUB-200-2011和Stanford Dogs)上的综合实验表明,PHPQ的性能优于现有最先进方法。