In this study, we explore the application of transformer-based models for emotion classification on text data. We train and evaluate several pre-trained transformer models, on the Emotion dataset using different variants of transformers. The paper also analyzes some factors that in-fluence the performance of the model, such as the fine-tuning of the transformer layer, the trainability of the layer, and the preprocessing of the text data. Our analysis reveals that commonly applied techniques like removing punctuation and stop words can hinder model performance. This might be because transformers strength lies in understanding contextual relationships within text. Elements like punctuation and stop words can still convey sentiment or emphasis and removing them might disrupt this context.
翻译:本研究探讨了基于Transformer的模型在文本数据情感分类中的应用。我们使用多种Transformer变体,在Emotion数据集上训练并评估了若干预训练的Transformer模型。本文还分析了影响模型性能的一些因素,例如Transformer层的微调、层的可训练性以及文本数据的预处理。我们的分析表明,常用的技术(如去除标点符号和停用词)可能会阻碍模型性能。这可能是因为Transformer的优势在于理解文本中的上下文关系。标点符号和停用词等元素仍能传达情感或强调,去除它们可能会破坏这种上下文。