SSLCL: An Efficient Model-Agnostic Supervised Contrastive Learning Framework for Emotion Recognition in Conversations

Emotion recognition in conversations (ERC) is a rapidly evolving task within the natural language processing community, which aims to detect the emotions expressed by speakers during a conversation. Recently, a growing number of ERC methods have focused on leveraging supervised contrastive learning (SCL) to enhance the robustness and generalizability of learned features. However, current SCL-based approaches in ERC are impeded by the constraint of large batch sizes and the lack of compatibility with most existing ERC models. To address these challenges, we propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL), which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions. Specifically, we introduce a novel perspective on utilizing label representations by projecting discrete labels into dense embeddings through a shallow multilayer perceptron, and formulate the training objective to maximize the similarity between sample features and their corresponding ground-truth label embeddings, while minimizing the similarity between sample features and label embeddings of disparate classes. Moreover, we innovatively adopt the Soft-HGR maximal correlation as a measure of similarity between sample features and label embeddings, leading to significant performance improvements over conventional similarity measures. Additionally, multimodal cues of utterances are effectively leveraged by SSLCL as data augmentations to boost model performances. Extensive experiments on two ERC benchmark datasets, IEMOCAP and MELD, demonstrate the compatibility and superiority of our proposed SSLCL framework compared to existing state-of-the-art SCL methods. Our code is available at \url{https://github.com/TaoShi1998/SSLCL}.

翻译：对话情感识别（ERC）是自然语言处理领域内快速发展的一项任务，旨在检测对话过程中说话者表达的情感。近年来，越来越多的ERC方法聚焦于利用监督对比学习（SCL）提升学习特征的鲁棒性和泛化能力。然而，当前基于SCL的ERC方法受限于对大批量（batch size）的需求，且与大多数现有ERC模型缺乏兼容性。为解决这些挑战，我们提出了一种高效且模型无关的SCL框架——基于Soft-HGR最大相关的监督样本-标签对比学习（SSLCL）。该框架无需大规模批量大小，且无需引入任何模型特定假设即可无缝集成至现有ERC模型中。具体而言，我们通过浅层多层感知机将离散标签投影为稠密嵌入，提出利用标签表示的新视角，并将训练目标设定为最大化样本特征与其真实标签嵌入之间的相似性，同时最小化样本特征与不同类别标签嵌入之间的相似性。此外，我们创新性地采用Soft-HGR最大相关作为样本特征与标签嵌入之间的相似性度量，相较于传统相似性度量实现了显著性能提升。SSLCL还有效利用多模态话语线索作为数据增强手段以提升模型性能。在IEMOCAP和MELD两个ERC基准数据集上的广泛实验表明，我们提出的SSLCL框架与现有最先进SCL方法相比具有兼容性和优越性。我们的代码已开源至\url{https://github.com/TaoShi1998/SSLCL}。