Despite its success in self-supervised learning, contrastive learning is less studied in the supervised setting. In this work, we first use a set of pilot experiments to show that in the supervised setting, the cross-entropy loss objective (CE) and the contrastive learning objective often conflict with each other, thus hindering the applications of CL in supervised settings. To resolve this problem, we introduce a novel \underline{A}ligned \underline{C}ontrastive \underline{L}earning (ACL) framework. First, ACL-Embed regards label embeddings as extra augmented samples with different labels and employs contrastive learning to align the label embeddings with its samples' representations. Second, to facilitate the optimization of ACL-Embed objective combined with the CE loss, we propose ACL-Grad, which will discard the ACL-Embed term if the two objectives are in conflict. To further enhance the performances of intermediate exits of multi-exit BERT, we further propose cross-layer ACL (ACL-CL), which is to ask the teacher exit to guide the optimization of student shallow exits. Extensive experiments on the GLUE benchmark results in the following takeaways: (a) ACL-BRT outperforms or performs comparably with CE and CE+SCL on the GLUE tasks; (b) ACL, especially CL-ACL, significantly surpasses the baseline methods on the fine-tuning of multi-exit BERT, thus providing better quality-speed tradeoffs for low-latency applications.
翻译:尽管对比学习在自监督学习中取得了成功,但在监督学习场景下的研究相对较少。在本工作中,我们首先通过一系列探索性实验表明,在监督学习场景下,交叉熵损失目标与对比学习目标常存在冲突,从而阻碍了对比学习在监督场景中的应用。为解决此问题,我们提出了一种新颖的对齐对比学习框架。首先,ACL-Embed将标签嵌入视为具有不同标签的额外增广样本,并采用对比学习将标签嵌入与其对应样本的表征进行对齐。其次,为促进ACL-Embed目标与交叉熵损失的联合优化,我们提出了ACL-Grad方法,该方法将在两个目标发生冲突时舍弃ACL-Embed项。为进一步提升多出口BERT中间出口的性能,我们进一步提出跨层ACL方法,通过教师出口指导浅层学生出口的优化过程。在GLUE基准测试上的大量实验得出以下结论:(a)ACL-BERT在GLUE任务上的表现优于或与交叉熵及交叉熵+监督对比学习方法相当;(b)ACL方法(特别是CL-ACL)在多出口BERT微调任务上显著超越基线方法,从而为低延迟应用提供了更优的质量-速度权衡。