Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F$_1$ score and accuracy despite the difficulty of the task.
翻译:情感分析是从文本中识别和提取主观信息的过程。尽管在自动化跨语言方法方面取得了进展,但情感分析系统的实施和评估仍需要特定语言的数据,以考虑各种社会文化和语言特性。本文描述了一个针对中库尔德语情感分析的数据集收集与标注过程。我们探索了几种经典的机器学习方法和基于神经网络的技术来完成此任务。此外,我们采用了一种迁移学习方法,利用预训练模型进行数据增强。我们证明,尽管任务难度较大,数据增强仍能实现较高的F$_1$分数和准确率。