Hash representation learning of multi-view heterogeneous data is the key to improving the accuracy of multimedia retrieval. However, existing methods utilize local similarity and fall short of deeply fusing the multi-view features, resulting in poor retrieval accuracy. Current methods only use local similarity to train their model. These methods ignore global similarity. Furthermore, most recent works fuse the multi-view features via a weighted sum or concatenation. We contend that these fusion methods are insufficient for capturing the interaction between various views. We present a novel Central Similarity Multi-View Hashing (CSMVH) method to address the mentioned problems. Central similarity learning is used for solving the local similarity problem, which can utilize the global similarity between the hash center and samples. We present copious empirical data demonstrating the superiority of gate-based fusion over conventional approaches. On the MS COCO and NUS-WIDE, the proposed CSMVH performs better than the state-of-the-art methods by a large margin (up to 11.41% mean Average Precision (mAP) improvement).
翻译:多视图异构数据的哈希表示学习是提升多媒体检索精度的关键。然而,现有方法依赖局部相似性,未能深度融合多视图特征,导致检索精度低下。当前方法仅利用局部相似性训练模型,忽视了全局相似性。此外,近期研究大多通过加权求和或拼接融合多视图特征。我们认为此类融合方法不足以捕捉各视图间的交互。为应对上述问题,本文提出一种新颖的中央相似性多视图哈希(CSMVH)方法。该方法采用中央相似性学习解决局部相似性局限,可充分利用哈希中心与样本间的全局相似性。大量实验数据表明,基于门控的融合方法在性能上显著优于传统方案。在MS COCO和NUS-WIDE数据集上,所提CSMVH方法以极大优势超越现有最优方法(平均准确率提升高达11.41%)。