Emotion recognition in software engineering texts is critical for understanding developer expressions and improving collaboration. This paper presents a comparative analysis of state-of-the-art Pre-trained Language Models (PTMs) for fine-grained emotion classification on two benchmark datasets from GitHub and Stack Overflow. We evaluate six transformer models - BERT, RoBERTa, ALBERT, DeBERTa, CodeBERT and GraphCodeBERT against the current best-performing tool SEntiMoji. Our analysis reveals consistent improvements ranging from 1.17\% to 16.79\% in terms of macro-averaged and micro-averaged F1 scores, with general domain models outperforming specialized ones. To further enhance PTMs, we incorporate polarity features in attention layer during training, demonstrating additional average gains of 1.0\% to 10.23\% over baseline PTMs approaches. Our work provides strong evidence for the advancements afforded by PTMs in recognizing nuanced emotions like Anger, Love, Fear, Joy, Sadness, and Surprise in software engineering contexts. Through comprehensive benchmarking and error analysis, we also outline scope for improvements to address contextual gaps.
翻译:软件工程文本中的情感识别对于理解开发者表达和改善协作至关重要。本文对两种基于GitHub和Stack Overflow的基准数据集进行了细粒度情感分类的最新预训练语言模型(PTMs)的对比分析。我们评估了六种Transformer模型——BERT、RoBERTa、ALBERT、DeBERTa、CodeBERT和GraphCodeBERT,并与当前性能最佳的工具SEntiMoji进行对比。分析显示,在宏平均和微平均F1分数上,模型性能持续提升1.17%至16.79%,且通用领域模型优于专用模型。为进一步增强PTMs,我们在训练过程中将极性特征融入注意力层,相较基线PTM方法额外获得了平均1.0%至10.23%的性能提升。本研究有力证明了PTMs在识别软件工程语境中愤怒、喜爱、恐惧、快乐、悲伤、惊讶等细微情感方面取得的进展。通过全面的基准测试和错误分析,我们还指出了解决语境差距的改进空间。