Recently, numerous embedding models have been made available and widely used for various NLP tasks. The Massive Text Embedding Benchmark (MTEB) has primarily simplified the process of choosing a model that performs well for several tasks in English, but extensions to other languages remain challenging. This is why we expand MTEB to propose the first massive benchmark of sentence embeddings for French. We gather 15 existing datasets in an easy-to-use interface and create three new French datasets for a global evaluation of 8 task categories. We compare 51 carefully selected embedding models on a large scale, conduct comprehensive statistical tests, and analyze the correlation between model performance and many of their characteristics. We find out that even if no model is the best on all tasks, large multilingual models pre-trained on sentence similarity perform exceptionally well. Our work comes with open-source code, new datasets and a public leaderboard.
翻译:近年来,众多嵌入模型已被开发并广泛应用于各类自然语言处理任务。大规模文本嵌入基准(MTEB)极大简化了针对英语多任务场景的模型选择过程,但其向其他语言的扩展仍面临挑战。为此,我们拓展MTEB框架,首次构建了面向法语的大规模语句嵌入基准。我们整合了15个现有数据集并提供易用接口,同时创建了三个新的法语数据集,涵盖8类任务的系统性评估。通过对51个精选嵌入模型进行大规模比较、实施全面的统计检验,并深入分析模型性能与其多维度特征之间的关联,我们发现:尽管没有模型能在所有任务上均表现最优,但基于语句相似度预训练的大规模多语言模型展现出卓越性能。本研究成果配套开源代码、新增数据集及公开排行榜。