Learning the hash representation of multi-view heterogeneous data is an important task in multimedia retrieval. However, existing methods fail to effectively fuse the multi-view features and utilize the metric information provided by the dissimilar samples, leading to limited retrieval precision. Current methods utilize weighted sum or concatenation to fuse the multi-view features. We argue that these fusion methods cannot capture the interaction among different views. Furthermore, these methods ignored the information provided by the dissimilar samples. We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems. Extensive empirical evidence is presented to show that gate-based fusion is better than typical methods. We introduce deep metric learning to the multi-view hashing problems, which can utilize metric information of dissimilar samples. On the MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current state-of-the-art methods by a large margin (up to 15.28 mean Average Precision (mAP) improvement).
翻译:学习多视角异构数据的哈希表示是多媒体检索中的重要任务。然而,现有方法未能有效融合多视角特征并利用不相似样本提供的度量信息,导致检索精度受限。当前方法采用加权求和或拼接的方式融合多视角特征。我们认为这些融合方法无法捕捉不同视角间的交互作用。此外,这些方法忽略了不相似样本所提供的信息。我们提出了一种新颖的深度度量多视角哈希(DMMVH)方法来解决上述问题。大量实验证据表明,基于门控的融合方法优于典型方法。我们将深度度量学习引入多视角哈希问题,从而能够利用不相似样本的度量信息。在MIR-Flickr25K、MS COCO和NUS-WIDE数据集上,我们的方法以较大优势(平均精度均值提升高达15.28%)超越当前最先进方法。