Emotion Classification based on text is a task with many applications which has received growing interest in recent years. This paper presents a preliminary study with the goal to help researchers and practitioners gain insight into relatively new datasets as well as emotion classification in general. We focus on three datasets that were recently presented in the related literature, and we explore the performance of traditional as well as state-of-the-art deep learning models in the presence of different characteristics in the data. We also explore the use of data augmentation in order to improve performance. Our experimental work shows that state-of-the-art models such as RoBERTa perform the best for all cases. We also provide observations and discussion that highlight the complexity of emotion classification in these datasets and test out the applicability of the models to actual social media posts we collected and labeled.
翻译:基于文本的情感分类是一项具有多种应用前景的任务,近年来受到越来越多的关注。本文开展了一项初步研究,旨在帮助研究人员和实践者深入了解相对较新的数据集以及一般意义上的情感分类问题。我们聚焦于近期相关文献中提出的三个数据集,并探究传统模型和先进深度学习模型在数据不同特征下的表现。同时,我们探索了数据增强方法以提升性能。实验结果表明,RoBERTa等先进模型在所有情况下均表现最佳。我们还提供了观察与讨论,揭示了这些数据集上情感分类的复杂性,并基于我们收集并标注的真实社交媒体帖子,测试了各模型的适用性。