Large Language Models (LLMs) have become an increasingly important tool in research and society at large. While LLMs are regularly used all over the world by experts and lay-people alike, they are predominantly developed with English-speaking users in mind, performing well in English and other wide-spread languages while less-resourced languages such as Luxembourgish are seen as a lower priority. This lack of attention is also reflected in the sparsity of available evaluation tools and datasets. In this study, we investigate the viability of language proficiency exams as such evaluation tools for the Luxembourgish language. We find that large models such as Claude and DeepSeek-R1 typically achieve high scores, while smaller models show weak performances. We also find that the performances in such language exams can be used to predict performances in other NLP tasks in Luxembourgish.
翻译:大语言模型(LLMs)已成为研究及社会各领域日益重要的工具。尽管全球范围内的专家和普通用户都经常使用大语言模型,但其开发主要面向英语使用者,在英语及其他广泛使用的语言中表现优异,而对卢森堡语等资源匮乏语言的重视程度较低。这种关注度的不足也体现在现有评估工具和数据集的稀缺性上。本研究探讨了将语言能力考试作为卢森堡语评估工具的可行性。研究发现,Claude 和 DeepSeek-R1 等大型模型通常能获得高分,而较小模型则表现欠佳。研究还表明,此类语言考试的表现可用于预测模型在卢森堡语其他自然语言处理任务中的表现。