African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.
翻译:非洲语言在NLP研究中严重缺乏代表性,原因在于缺乏覆盖多项NLP任务的数据集。尽管存在一些针对特定语言且正在扩展至不同任务的数据集,但仅有少数NLP任务(例如命名实体识别和机器翻译)拥有覆盖多个地理和类型多样的非洲语言的标准化基准数据集。在本文中,我们开发了MasakhaNEWS——一个涵盖非洲广泛使用的16种语言的新闻主题分类新基准数据集。我们通过训练经典机器学习模型和微调多个语言模型,对基准模型进行了评估。此外,我们探讨了多种替代完整语言模型微调的方法,这些方法更适用于零样本和少样本学习,例如跨语言参数高效微调(如MAD-X)、模式利用训练(PET)、提示语言模型(如ChatGPT)以及无提示句子变换器微调(SetFit和Cohere嵌入API)。我们在零样本设置下的评估显示,提示ChatGPT进行低资源非洲语言新闻主题分类具有潜力,在不利用MAD-X等额外监督的情况下,平均F1分数达到70。在少样本设置中,我们证明,利用PET方法,每个标签仅需10个示例,即可达到完全监督训练性能(92.6 F1分数)的90%以上(即86.0 F1分数)。