美因茨方言仍是美因茨方言，但大语言模型不会说它 (Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect)

Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a fate it shares with many other German dialects. Natural language processing (NLP) has the potential to help with the preservation and revival efforts of languages and dialects. However, so far no NLP research has looked at Meenzerisch. This work presents the first research in the field of NLP that is explicitly focused on the dialect of Mainz. We introduce a digital dictionary-an NLP-ready dataset derived from an existing resource (Schramm, 1966)-to support researchers in modeling and benchmarking the language. It contains 2,351 words in the dialect paired with their meanings described in Standard German. We then use this dataset to answer the following research questions: (1) Can state-of-the-art large language models (LLMs) generate definitions for dialect words? (2) Can LLMs generate words in Meenzerisch, given their definitions? Our experiments show that LLMs can do neither: the best model for definitions reaches only 6.27% accuracy and the best word generation model's accuracy is 1.51%. We then conduct two additional experiments in order to see if accuracy is improved by few-shot learning and by extracting rules from the training set, which are then passed to the LLM. While those approaches are able to improve the results, accuracy remains below 10%. This highlights that additional resources and an intensification of research efforts focused on German dialects are desperately needed.

翻译：美因茨方言是德国美因茨市使用的方言，也是德国全国闻名的年度庆典——美因茨狂欢节的传统语言。然而，美因茨方言正濒临消亡，这一命运与许多其他德国方言相同。自然语言处理（NLP）在语言和方言的保护与复兴工作中具有潜力。然而，迄今为止尚无NLP研究关注美因茨方言。本研究首次在NLP领域开展明确聚焦美因茨方言的研究。我们引入了一个数字词典——一个源自现有资源（Schramm, 1966）的、可用于NLP的数据集，以支持研究者对该语言进行建模和基准测试。该数据集包含2,351个方言词汇及其对应的标准德语释义。随后，我们利用该数据集回答以下研究问题：（1）最先进的大语言模型（LLMs）能否生成方言词汇的定义？（2）给定定义，LLMs能否生成美因茨方言词汇？我们的实验表明，LLMs两者皆不能：最佳的定义生成模型准确率仅为6.27%，而最佳词汇生成模型的准确率为1.51%。接着，我们进行了两项额外实验，以探究是否可以通过小样本学习以及从训练集中提取规则并传递给LLM来提高准确率。尽管这些方法能够改善结果，但准确率仍低于10%。这突出表明，迫切需要更多资源并加强对德国方言的研究投入。