The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages. We explore the extent to which pretrained large language models (LLMs) can bridge this gap, via the example of data-to-text generation for Irish, Welsh, Breton and Maltese. We test LLMs on these under-resourced languages and English, in a range of scenarios. We find that LLMs easily set the state of the art for the under-resourced languages by substantial margins, as measured by both automatic and human evaluations. For all our languages, human evaluation shows on-a-par performance with humans for our best systems, but BLEU scores collapse compared to English, casting doubt on the metric's suitability for evaluating non-task-specific systems. Overall, our results demonstrate the great potential of LLMs to bridge the performance gap for under-resourced languages.
翻译:当前,针对严重资源匮乏语言的NLP方法在性能上无法与资源丰富语言的最先进NLP技术相媲美。我们以爱尔兰语、威尔士语、布列塔尼语和马耳他语的数据到文本生成为例,探究预训练大型语言模型(LLMs)能在多大程度上缩小这一差距。我们在一系列场景中测试了LLMs在资源匮乏语言及英语上的表现。研究发现,在自动评估和人工评估中,LLMs以显著优势在资源匮乏语言上轻松达到最先进水平。对于所有测试语言,人工评估显示,我们的最佳系统性能与人类水平相当,但BLEU分数相比英语出现骤降,这揭示了该指标在评估非任务特定系统时的适用性存疑。总体而言,我们的结果表明,LLMs在缩小资源匮乏语言的性能差距方面具有巨大潜力。