Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels

Cross-modal hashing (CMH) has appeared as a popular technique for cross-modal retrieval due to its low storage cost and high computational efficiency in large-scale data. Most existing methods implicitly assume that multi-modal data is correctly labeled, which is expensive and even unattainable due to the inevitable imperfect annotations (i.e., noisy labels) in real-world scenarios. Inspired by human cognitive learning, a few methods introduce self-paced learning (SPL) to gradually train the model from easy to hard samples, which is often used to mitigate the effects of feature noise or outliers. It is a less-touched problem that how to utilize SPL to alleviate the misleading of noisy labels on the hash model. To tackle this problem, we propose a new cognitive cross-modal retrieval method called Robust Self-paced Hashing with Noisy Labels (RSHNL), which can mimic the human cognitive process to identify the noise while embracing robustness against noisy labels. Specifically, we first propose a contrastive hashing learning (CHL) scheme to improve multi-modal consistency, thereby reducing the inherent semantic gap. Afterward, we propose center aggregation learning (CAL) to mitigate the intra-class variations. Finally, we propose Noise-tolerance Self-paced Hashing (NSH) that dynamically estimates the learning difficulty for each instance and distinguishes noisy labels through the difficulty level. For all estimated clean pairs, we further adopt a self-paced regularizer to gradually learn hash codes from easy to hard. Extensive experiments demonstrate that the proposed RSHNL performs remarkably well over the state-of-the-art CMH methods.

翻译：跨模态哈希（CMH）因其在大规模数据中存储成本低、计算效率高而成为跨模态检索的常用技术。现有方法大多隐含假设多模态数据标注正确，然而现实场景中不可避免存在标注缺陷（即含噪标签），导致获取完全准确标注的成本高昂甚至不可实现。受人类认知学习启发，部分方法引入自步学习（SPL）以从易到难逐步训练模型，常用于缓解特征噪声或异常值的影响。如何利用SPL减轻噪声标签对哈希模型的误导，仍是一个较少被探索的问题。针对此问题，我们提出一种名为"含噪标签鲁棒自步哈希"（RSHNL）的新型认知跨模态检索方法，该方法能模拟人类认知过程识别噪声，同时对含噪标签保持鲁棒性。具体而言，我们首先提出对比哈希学习（CHL）方案以增强多模态一致性，从而减小固有语义鸿沟；随后提出中心聚合学习（CAL）以缓解类内差异；最后提出噪声容忍自步哈希（NSH），动态评估每个实例的学习难度，并通过难度级别区分含噪标签。对于所有估计的干净样本对，我们进一步采用自步正则化器实现从易到难的渐进式哈希编码学习。大量实验表明，所提出的RSHNL方法在跨模态哈希任务上显著优于当前最先进方法。