Supervised contrastive learning has achieved remarkable success by leveraging label information; however, determining positive samples in multi-label scenarios remains a critical challenge. In multi-label supervised contrastive learning (MSCL), relations among multi-label samples are not yet fully defined, leading to ambiguity in identifying positive samples and formulating contrastive loss functions to construct the representation space. To address these challenges, we: (i) first define five distinct multi-label relations in MSCL to systematically identify positive samples, (ii) introduce a novel Similarity-Dissimilarity Loss that dynamically re-weights samples through computing the similarity and dissimilarity factors between positive samples and given anchors based on multi-label relations, and (iii) further provide theoretical grounded proof for our method through rigorous mathematical analysis that supports the formulation and effectiveness of the proposed loss function. We conduct the experiments across both image and text modalities, and extend the evaluation to medical domain. The results demonstrate that our method consistently outperforms baselines in a comprehensive evaluation, confirming its effectiveness and robustness. Code is available at: https://github.com/guangminghuang/similarity-dissimilarity-loss.
翻译:监督对比学习通过利用标签信息取得了显著成功;然而,在多标签场景中确定正样本仍然是一个关键挑战。在多标签监督对比学习(MSCL)中,多标签样本之间的关系尚未得到明确定义,导致识别正样本以及构建表示空间的对比损失函数存在模糊性。为解决这些挑战,我们:(i)首先在MSCL中定义了五种不同的多标签关系,以系统性地识别正样本;(ii)提出了一种新颖的相似性-相异性损失函数,该函数通过基于多标签关系计算正样本与给定锚点之间的相似性和相异性因子,动态调整样本权重;(iii)进一步通过严格的数学分析为我们的方法提供了理论依据,支持所提出损失函数的构建和有效性。我们在图像和文本模态上进行了实验,并将评估扩展到医学领域。结果表明,我们的方法在综合评估中始终优于基线方法,证实了其有效性和鲁棒性。代码发布于:https://github.com/guangminghuang/similarity-dissimilarity-loss。