Emotion recognition in software engineering texts is critical for understanding developer expressions and improving collaboration. This paper presents a comparative analysis of state-of-the-art Pre-trained Language Models (PTMs) for fine-grained emotion classification on two benchmark datasets from GitHub and Stack Overflow. We evaluate six transformer models - BERT, RoBERTa, ALBERT, DeBERTa, CodeBERT and GraphCodeBERT against the current best-performing tool SEntiMoji. Our analysis reveals consistent improvements ranging from 1.17\% to 16.79\% in terms of macro-averaged and micro-averaged F1 scores, with general domain models outperforming specialized ones. To further enhance PTMs, we incorporate polarity features in attention layer during training, demonstrating additional average gains of 1.0\% to 10.23\% over baseline PTMs approaches. Our work provides strong evidence for the advancements afforded by PTMs in recognizing nuanced emotions like Anger, Love, Fear, Joy, Sadness, and Surprise in software engineering contexts. Through comprehensive benchmarking and error analysis, we also outline scope for improvements to address contextual gaps.
翻译:软件工程文本中的情感识别对于理解开发者表达和改进协作至关重要。本文对两个来自GitHub和Stack Overflow的基准数据集上的细粒度情感分类任务,比较分析了最先进的预训练语言模型(PTMs)。我们评估了六个Transformer模型——BERT、RoBERTa、ALBERT、DeBERTa、CodeBERT和GraphCodeBERT,并将其与当前最佳工具SEntiMoji进行对比。分析表明,在宏观平均和微观平均F1分数方面,改进幅度持续在1.17%至16.79%之间,其中通用领域模型表现优于专用模型。为进一步提升PTMs性能,我们在训练过程中将极性特征融入注意力层,相较基线PTMs方法实现了额外平均1.0%至10.23%的提升。本研究有力证明了PTMs在识别软件工程语境中"愤怒"、"喜爱"、"恐惧"、"喜悦"、"悲伤"和"惊讶"等细微情感方面的进步。通过全面的基准测试和错误分析,我们还指出了弥补上下文差距的改进空间。