Emotion recognition in software engineering texts is critical for understanding developer expressions and improving collaboration. This paper presents a comparative analysis of state-of-the-art Pre-trained Language Models (PTMs) for fine-grained emotion classification on two benchmark datasets from GitHub and Stack Overflow. We evaluate six transformer models - BERT, RoBERTa, ALBERT, DeBERTa, CodeBERT and GraphCodeBERT against the current best-performing tool SEntiMoji. Our analysis reveals consistent improvements ranging from 1.17% to 16.79% in terms of macro-averaged and micro-averaged F1 scores, with general domain models outperforming specialized ones. To further enhance PTMs, we incorporate polarity features in attention layer during training, demonstrating additional average gains of 1.0\% to 10.23\% over baseline PTMs approaches. Our work provides strong evidence for the advancements afforded by PTMs in recognizing nuanced emotions like Anger, Love, Fear, Joy, Sadness, and Surprise in software engineering contexts. Through comprehensive benchmarking and error analysis, we also outline scope for improvements to address contextual gaps.
翻译:软件工程文本中的情感识别对于理解开发者表达、改进协作至关重要。本文对两个来自GitHub和Stack Overflow的基准数据集进行了细粒度情感分类的对比分析,评估了当前最先进的预训练语言模型(PTMs)。我们针对六种Transformer模型——BERT、RoBERTa、ALBERT、DeBERTa、CodeBERT和GraphCodeBERT,与当前性能最优的工具SEntiMoji进行了对比。分析显示,在宏平均和微平均F1分数上,这些模型实现了1.17%至16.79%的一致性提升,其中通用领域模型表现优于专用模型。为进一步增强PTMs,我们在训练过程中将情感极性特征融入注意力层,相较基线PTM方法额外获得了1.0%至10.23%的平均增益。本研究为PTMs在软件工程背景下识别愤怒、爱、恐惧、快乐、悲伤、惊讶等细微情感方面的进步提供了有力证据。通过全面的基准测试和错误分析,我们还指出了改进空间以解决上下文差距问题。