Tuned Contrastive Learning

In recent times, contrastive learning based loss functions have become increasingly popular for visual self-supervised representation learning owing to their state-of-the-art (SOTA) performance. Most of the modern contrastive learning loss functions like SimCLR are Info-NCE based and generalize only to one positive and multiple negatives per anchor. A recent state-of-the-art, supervised contrastive (SupCon) loss, extends self-supervised contrastive learning to supervised setting by generalizing to multiple positives and multiple negatives in a batch and improves upon the cross-entropy loss. In this paper, we propose a novel contrastive loss function - Tuned Contrastive Learning (TCL) loss, that generalizes to multiple positives and multiple negatives within a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives. We provide theoretical analysis of our loss function's gradient response and show mathematically how it is better than that of SupCon loss. Empirically, we compare our loss function with SupCon loss and cross-entropy loss in a supervised setting on multiple classification-task datasets. We also show the stability of our loss function to various hyper-parameter settings. Finally, we compare TCL with various SOTA self-supervised learning methods and show that our loss function achieves performance on par with SOTA methods in both supervised and self-supervised settings.

翻译：近年来，基于对比学习的损失函数因其在视觉自监督表征学习中的最优性能而日益流行。现代对比学习损失函数（如SimCLR）大多基于Info-NCE，且仅适用于每个锚点对应一个正样本和多个负样本的泛化场景。最新提出的监督对比损失函数（SupCon）通过泛化至批次内多个正样本和多个负样本，将自监督对比学习扩展至监督学习场景，并优于交叉熵损失。本文提出一种新型对比损失函数——调谐对比学习（TCL）损失，该函数可泛化至批次内多个正样本和多个负样本，并提供参数以调谐并增强难正样本和难负样本的梯度响应。我们从理论上分析了该损失函数的梯度响应，并数学证明其优越性优于SupCon损失。通过实验，我们在多个分类任务数据集上，将所提损失函数与SupCon损失及交叉熵损失在监督学习场景下进行比较。同时，我们验证了该损失函数对不同超参数设置的稳定性。最后，将TCL与多种最优自监督学习方法对比，结果表明该损失函数在监督学习和自监督学习场景下均能达到与最优方法相当的性能。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

专知会员服务

39+阅读 · 2020年11月3日

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

33+阅读 · 2020年4月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》