Unsupervised hashing methods typically aim to preserve the similarity between data points in a feature space by mapping them to binary hash codes. However, these methods often overlook the fact that the similarity between data points in the continuous feature space may not be preserved in the discrete hash code space, due to the limited similarity range of hash codes. The similarity range is bounded by the code length and can lead to a problem known as similarity collapse. That is, the positive and negative pairs of data points become less distinguishable from each other in the hash space. To alleviate this problem, in this paper a novel Similarity Distribution Calibration (SDC) method is introduced. SDC aligns the hash code similarity distribution towards a calibration distribution (e.g., beta distribution) with sufficient spread across the entire similarity range, thus alleviating the similarity collapse problem. Extensive experiments show that our SDC outperforms significantly the state-of-the-art alternatives on coarse category-level and instance-level image retrieval. Code is available at https://github.com/kamwoh/sdc.
翻译:无监督哈希方法通常旨在通过将数据点映射为二进制哈希码,以保持其在特征空间中的相似性。然而,这些方法常常忽略一个事实:由于哈希码的相似性范围有限,数据点在连续特征空间中的相似性可能无法在离散的哈希码空间中得以保留。相似性范围受限于码长,并可能导致称为相似性坍缩的问题。即,数据点的正负样本对在哈希空间中的区分度降低。为缓解该问题,本文提出了一种新颖的相似度分布校准(SDC)方法。SDC将哈希码相似性分布对齐至具有覆盖整个相似性范围充分展布的校准分布(例如贝塔分布),从而缓解相似性坍缩问题。大量实验表明,在粗粒度类别级和实例级图像检索任务中,我们的SDC显著优于现有最优替代方法。代码已开源至https://github.com/kamwoh/sdc。