To address the need for a more comprehensive evaluation of French Natural Language Understanding (NLU), we introduce COLE, a new benchmark composed of 23 diverse task covering a broad range of NLU capabilities, including sentiment analysis, paraphrase detection, grammatical judgment, and reasoning, with a particular focus on linguistic phenomena relevant to the French language. We benchmark 94 large language models (LLM), providing an extensive analysis of the current state of French NLU. Our results highlight a significant performance gap between closed- and open-weights models and identify key challenging frontiers for current LLMs, such as zero-shot extractive question-answering (QA), fine-grained word sense disambiguation, and understanding of regional language variations. We release COLE as a public resource to foster further progress in French language modelling.
翻译:为应对法语自然语言理解(NLU)领域对更全面评估体系的需求,我们提出了COLE——一个包含23项多样化任务的新型基准,涵盖情感分析、复述检测、语法判断和推理等广泛的NLU能力维度,并特别关注与法语语言特性相关的语言学现象。我们对94个大型语言模型(LLM)进行了系统性评测,全面呈现了当前法语NLU的发展现状。研究结果揭示了闭源权重模型与开源权重模型之间存在显著性能差距,并指出了当前LLMs面临的关键挑战领域,包括零样本抽取式问答(QA)、细粒度词义消歧以及区域性语言变体的理解。我们将COLE作为公共资源发布,以期推动法语语言建模领域的持续发展。