Existing unsupervised hashing methods typically adopt a feature similarity preservation paradigm. As a result, they overlook the intrinsic similarity capacity discrepancy between the continuous feature and discrete hash code spaces. Specifically, since the feature similarity distribution is intrinsically biased (e.g., moderately positive similarity scores on negative pairs), the hash code similarities of positive and negative pairs often become inseparable (i.e., the similarity collapse problem). To solve this problem, in this paper a novel Similarity Distribution Calibration (SDC) method is introduced. Instead of matching individual pairwise similarity scores, SDC aligns the hash code similarity distribution towards a calibration distribution (e.g., beta distribution) with sufficient spread across the entire similarity capacity/range, to alleviate the similarity collapse problem. Extensive experiments show that our SDC outperforms the state-of-the-art alternatives on both coarse category-level and instance-level image retrieval tasks, often by a large margin. Code is available at https://github.com/kamwoh/sdc.
翻译:现有的无监督哈希方法通常采用特征相似性保持范式。因此,它们忽略了连续特征空间与离散哈希码空间之间固有的相似性容量差异。具体而言,由于特征相似性分布存在内在偏差(例如,负样本对呈现中等程度的正向相似性得分),正负样本对的哈希码相似性往往变得不可区分(即相似性崩溃问题)。为解决该问题,本文提出了一种新颖的相似性分布校准(SDC)方法。与匹配个体成对相似性得分不同,SDC将哈希码相似性分布对齐至一个具有充分覆盖整个相似性容量/范围的校准分布(例如β分布),从而缓解相似性崩溃问题。大量实验表明,在粗粒度类别级和实例级图像检索任务中,我们的SDC方法均显著优于现有最优替代方案,且通常以较大幅度领先。代码开源地址:https://github.com/kamwoh/sdc。