We present a comprehensive evaluation of large language models(LLMs)' ability to reason about composition relations through a benchmark encompassing 1,500 test cases in English, designed to cover six distinct types of composition relations: Positional, Comparative, Personal, Mathematical, Identity, and Other. Acknowledging the significance of multilingual capabilities, we expanded our assessment to include translations of these cases into Chinese, Japanese, French, and Korean. Our Multilingual Composition Relation (MCR) benchmark aims at investigating the robustness and adaptability of LLMs in handling composition relation reasoning across diverse linguistic contexts.
翻译:我们通过一个包含1500个英文测试用例的基准测试,全面评估了大语言模型(LLMs)在组合关系推理方面的能力。该基准测试覆盖了六种不同类型的组合关系:位置关系、比较关系、人际关系、数学关系、同一关系及其他关系。考虑到多语言能力的重要性,我们将评估范围扩展至这些用例的中文、日语、法语和韩语翻译版本。我们的多语言组合关系(MCR)基准旨在探究大语言模型在处理跨语言语境下的组合关系推理时的鲁棒性和适应性。