The analysis of emotions expressed in text has numerous applications. In contrast to categorical analysis, focused on classifying emotions according to a pre-defined set of common classes, dimensional approaches can offer a more nuanced way to distinguish between different emotions. Still, dimensional methods have been less studied in the literature. Considering a valence-arousal dimensional space, this work assesses the use of pre-trained Transformers to predict these two dimensions on a continuous scale, with input texts from multiple languages and domains. We specifically combined multiple annotated datasets from previous studies, corresponding to either emotional lexica or short text documents, and evaluated models of multiple sizes and trained under different settings. Our results show that model size can have a significant impact on the quality of predictions, and that by fine-tuning a large model we can confidently predict valence and arousal in multiple languages. We make available the code, models, and supporting data.
翻译:文本中表达的情绪分析具有诸多应用。不同于基于预定义常见类别进行情绪分类的类别分析,维度方法能够以更细致的方式区分不同情绪。然而,文献中对维度方法的研究相对较少。本研究以效价-唤醒度二维空间为框架,评估了使用预训练Transformer在连续尺度上预测这两个维度的能力,输入文本涵盖多种语言和领域。我们特别整合了以往研究中多个带标注数据集,包括情感词库和短文本文档,并评估了不同规模及不同训练设置下的模型。结果表明,模型规模对预测质量有显著影响,通过微调大规模模型,我们能够可靠地预测多种语言的效价与唤醒度。我们公开了代码、模型及支持数据。