SSLCL: An Efficient Model-Agnostic Supervised Contrastive Learning Framework for Emotion Recognition in Conversations

Emotion recognition in conversations (ERC) is a rapidly evolving task within the natural language processing community, which aims to detect the emotions expressed by speakers during a conversation. Recently, a growing number of ERC methods have focused on leveraging supervised contrastive learning (SCL) to enhance the robustness and generalizability of learned features. However, current SCL-based approaches in ERC are impeded by the constraint of large batch sizes and the lack of compatibility with most existing ERC models. To address these challenges, we propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL), which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions. Specifically, we introduce a novel perspective on utilizing label representations by projecting discrete labels into dense embeddings through a shallow multilayer perceptron, and formulate the training objective to maximize the similarity between sample features and their corresponding ground-truth label embeddings, while minimizing the similarity between sample features and label embeddings of disparate classes. Moreover, we innovatively adopt the Soft-HGR maximal correlation as a measure of similarity between sample features and label embeddings, leading to significant performance improvements over conventional similarity measures. Additionally, multimodal cues of utterances are effectively leveraged by SSLCL as data augmentations to boost model performances. Extensive experiments on two ERC benchmark datasets, IEMOCAP and MELD, demonstrate the compatibility and superiority of our proposed SSLCL framework compared to existing state-of-the-art SCL methods. Our code is available at \url{https://github.com/TaoShi1998/SSLCL}.

翻译：对话情感识别（ERC）是自然语言处理领域中快速发展的任务，旨在检测对话中说话者表达的情感。近年来，越来越多的ERC方法聚焦于利用监督对比学习（SCL）来增强学习特征的鲁棒性和泛化能力。然而，当前基于SCL的ERC方法受限于大批量需求，且与大多数现有ERC模型缺乏兼容性。为解决这些挑战，我们提出了一种高效且模型无关的SCL框架——基于Soft-HGR最大相关性的监督样本-标签对比学习（SSLCL），该框架无需大批量大小，可无缝集成至现有ERC模型且无需引入任何模型特定假设。具体而言，我们提出利用标签表示的新视角：通过浅层多层感知器将离散标签投影为稠密嵌入，并制定训练目标以最大化样本特征与其对应真实标签嵌入之间的相似性，同时最小化样本特征与其他类别标签嵌入之间的相似性。此外，我们创新性地采用Soft-HGR最大相关性作为样本特征与标签嵌入的相似性度量，相较于传统相似性度量方法取得了显著的性能提升。SSLCL还通过有效利用多模态话语线索作为数据增强手段以提升模型性能。在IEMOCAP和MELD两个ERC基准数据集上的大量实验表明，与现有最先进的SCL方法相比，我们所提出的SSLCL框架具有兼容性和优越性。我们的代码已开源至\url{https://github.com/TaoShi1998/SSLCL}。