3D cross-modal retrieval is gaining attention in the multimedia community. Central to this topic is learning a joint embedding space to represent data from different modalities, such as images, 3D point clouds, and polygon meshes, to extract modality-invariant and discriminative features. Hence, the performance of cross-modal retrieval methods heavily depends on the representational capacity of this embedding space. Existing methods treat all instances equally, applying the same penalty strength to instances with varying degrees of difficulty, ignoring the differences between instances. This can result in ambiguous convergence or local optima, severely compromising the separability of the feature space. To address this limitation, we propose an Instance-Variant loss to assign different penalty strengths to different instances, improving the space separability. Specifically, we assign different penalty weights to instances positively related to their intra-class distance. Simultaneously, we reduce the cross-modal discrepancy between features by learning a shared weight vector for the same class data from different modalities. By leveraging the Gaussian RBF kernel to evaluate sample similarity, we further propose an Intra-Class loss function that minimizes the intra-class distance among same-class instances. Extensive experiments on three 3D cross-modal datasets show that our proposed method surpasses recent state-of-the-art approaches.
翻译:三维跨模态检索正受到多媒体领域的广泛关注。该领域的核心在于学习一个联合嵌入空间,以表示来自不同模态(如图像、三维点云和多边形网格)的数据,从而提取模态不变且具有判别力的特征。因此,跨模态检索方法的性能很大程度上取决于该嵌入空间的表示能力。现有方法对所有实例一视同仁,对具有不同难度的实例施加相同的惩罚强度,忽略了实例之间的差异。这可能导致模糊收敛或陷入局部最优,严重损害特征空间的可分性。针对这一局限,我们提出了一种实例变分损失函数,为不同实例分配不同的惩罚强度,以提升空间可分性。具体而言,我们为实例分配与其类内距离成正比的惩罚权重。同时,通过学习不同模态下同类数据的共享权重向量,减少特征间的跨模态差异。通过利用高斯径向基核评估样本相似度,我们进一步提出一种类内损失函数,最小化同类实例之间的类内距离。在三个三维跨模态数据集上的大量实验表明,我们提出的方法超越了当前最先进的方法。