To address the need for a more comprehensive evaluation of French Natural Language Understanding (NLU), we introduce COLE, a new benchmark composed of 23 diverse task covering a broad range of NLU capabilities, including sentiment analysis, paraphrase detection, grammatical judgment, and reasoning, with a particular focus on linguistic phenomena relevant to the French language. We benchmark 94 large language models (LLM), providing an extensive analysis of the current state of French NLU. Our results highlight a significant performance gap between closed- and open-weights models and identify key challenging frontiers for current LLMs, such as zero-shot extractive question-answering (QA), fine-grained word sense disambiguation, and understanding of regional language variations. We release COLE as a public resource to foster further progress in French language modelling.
翻译:为满足对法语自然语言理解(NLU)进行全面评估的需求,我们提出了COLE——一个由23项多样化任务构成的新基准,涵盖情感分析、复述检测、语法判断和推理等广泛的NLU能力,并特别关注与法语相关的语言现象。我们对94个大型语言模型(LLM)进行了基准测试,对当前法语NLU的发展现状进行了全面分析。研究结果揭示了闭源权重模型与开源权重模型之间存在显著性能差距,并指出了当前LLM面临的关键挑战领域,例如零样本抽取式问答(QA)、细粒度词义消歧以及区域性语言变体的理解。我们将COLE作为公共资源发布,以促进法语语言建模的进一步发展。