SSLCL: An Efficient Model-Agnostic Supervised Contrastive Learning Framework for Emotion Recognition in Conversations

Emotion recognition in conversations (ERC) is a rapidly evolving task within the natural language processing community, which aims to detect the emotions expressed by speakers during a conversation. Recently, a growing number of ERC methods have focused on leveraging supervised contrastive learning (SCL) to enhance the robustness and generalizability of learned features. However, current SCL-based approaches in ERC are impeded by the constraint of large batch sizes and the lack of compatibility with most existing ERC models. To address these challenges, we propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL), which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions. Specifically, we introduce a novel perspective on utilizing label representations by projecting discrete labels into dense embeddings through a shallow multilayer perceptron, and formulate the training objective to maximize the similarity between sample features and their corresponding ground-truth label embeddings, while minimizing the similarity between sample features and label embeddings of disparate classes. Moreover, we innovatively adopt the Soft-HGR maximal correlation as a measure of similarity between sample features and label embeddings, leading to significant performance improvements over conventional similarity measures. Additionally, multimodal cues of utterances are effectively leveraged by SSLCL as data augmentations to boost model performances. Extensive experiments on two ERC benchmark datasets, IEMOCAP and MELD, demonstrate the compatibility and superiority of our proposed SSLCL framework compared to existing state-of-the-art SCL methods. Our code is available at \url{https://github.com/TaoShi1998/SSLCL}.

翻译：对话情感识别（ERC）是自然语言处理领域中快速发展的任务，旨在检测对话过程中说话者表达的情感。近年来，越来越多的ERC方法聚焦于利用监督对比学习（SCL）来增强学习特征的鲁棒性与泛化能力。然而，当前基于SCL的ERC方法受限于大批次尺寸需求以及与现有多数ERC模型的兼容性不足。为解决这些挑战，我们提出了一种高效且模型无关的SCL框架——基于Soft-HGR最大相关的监督样本-标签对比学习（SSLCL）。该框架无需大批次尺寸，且无需引入任何模型特定假设即可无缝集成至现有ERC模型中。具体而言，我们提出了一种利用标签表示的新视角：通过浅层多层感知器将离散标签投影为稠密嵌入，并将训练目标设定为最大化样本特征与其对应真实标签嵌入之间的相似度，同时最小化样本特征与其他类别标签嵌入之间的相似度。此外，我们创新性地采用Soft-HGR最大相关作为样本特征与标签嵌入之间的相似度度量，相较于传统相似度量方法实现了显著的性能提升。同时，SSLCL有效利用话语的多模态线索作为数据增强手段来提升模型性能。在IEMOCAP与MELD两个ERC基准数据集上的实验表明，与现有最先进的SCL方法相比，我们提出的SSLCL框架具有兼容性与优越性。我们的代码已开源至\url{https://github.com/TaoShi1998/SSLCL}。