In recent years, sentiment analysis has gained significant importance in natural language processing. However, most existing models and datasets for sentiment analysis are developed for high-resource languages, such as English and Chinese, leaving low-resource languages, particularly African languages, largely unexplored. The AfriSenti-SemEval 2023 Shared Task 12 aims to fill this gap by evaluating sentiment analysis models on low-resource African languages. In this paper, we present our solution to the shared task, where we employed different multilingual XLM-R models with classification head trained on various data, including those retrained in African dialects and fine-tuned on target languages. Our team achieved the third-best results in Subtask B, Track 16: Multilingual, demonstrating the effectiveness of our approach. While our model showed relatively good results on multilingual data, it performed poorly in some languages. Our findings highlight the importance of developing more comprehensive datasets and models for low-resource African languages to advance sentiment analysis research. We also provided the solution on the github repository.
翻译:近年来,情感分析在自然语言处理中的重要性日益凸显。然而,现有的大多数情感分析模型和数据集主要面向英语和汉语等高资源语言,而低资源语言(尤其是非洲语言)的研究仍较为匮乏。AfriSenti-SemEval 2023共享任务12旨在通过评估低资源非洲语言的情感分析模型来弥补这一空白。本文介绍了我们针对该共享任务提出的解决方案:采用不同的多语言XLM-R模型,并结合基于多种数据(包括在非洲方言上重新训练的模型以及在目标语言上微调的模型)训练的分类头。我们的团队在子任务B(Track 16:多语言)中取得了第三名的成绩,验证了该方法的有效性。尽管我们的模型在多语言数据上表现相对良好,但在部分语言上的结果欠佳。研究结果凸显了为低资源非洲语言开发更全面的数据集和模型以推动情感分析研究的重要性。相关解决方案已发布于GitHub仓库。