The tasks of idiom understanding and dialect understanding are both well-established benchmarks in natural language processing. In this paper, we propose combining them, and using regional idioms as a test of dialect understanding. Towards this end, we propose two new benchmark datasets for the Quebec dialect of French: QFrCoRE, which contains 4,633 instances of idiomatic phrases, and QFrCoRT, which comprises 171 regional instances of idiomatic words, and a new benchmark for French Metropolitan expressions, MFrCoE, which comprises 4,938 phrases. We explain how to construct these corpora, so that our methodology can be replicated for other dialects. Our experiments with 111 LLMs reveal a critical disparity in dialectal competence: while models perform well on French Metropolitan , 65.8% of them perform significantly worse on Quebec idioms, with only 9.0% favoring the regional dialect. These results confirm that our benchmarks are a reliable tool for quantifying the dialect gap and that prestige-language proficiency does not guarantee regional dialect understanding.
翻译:习语理解和方言理解任务均是自然语言处理领域成熟的基准测试。本文提出将二者结合,以区域性习语作为方言理解能力的测试手段。为此,我们构建了两个针对魁北克法语方言的新基准数据集:包含4,633条习语短语实例的QFrCoRE,以及涵盖171个区域性习语词汇实例的QFrCoRT;同时创建了包含4,938条短语的法国本土表达基准数据集MFrCoE。我们详细阐述了语料库的构建方法,以确保该方法论可复用于其他方言研究。通过对111个大语言模型的实验,我们揭示了方言能力存在的显著差异:虽然模型在法国本土表达上表现良好,但其中65.8%的模型在魁北克习语理解上表现显著较差,仅9.0%的模型更擅长理解区域性方言。这些结果证实了本基准测试能有效量化方言能力差距,并表明标准语言能力并不能保证对区域性方言的理解。