Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as <blank> is to <blank>" requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work, we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for proportional analogy completion and evaluate the performance of contemporary Large Language Models (LLMs) in various knowledge-enhanced prompt settings. Specifically, we augment prompts with three types of knowledge: exemplar, structured, and targeted. Our results show that despite extensive training data, solving proportional analogies remains challenging for current LLMs, with the best model achieving an accuracy of 55%. Notably, we find that providing targeted knowledge can better assist models in completing proportional analogies compared to providing exemplars or collections of structured knowledge. Our code and data are available at: https://github.com/Thiliniiw/KnowledgePrompts/
翻译:类比推理是认知的基础。比例类比由四个术语构成,常被用于评估语言与认知能力。例如,完成"氧气之于气体,犹如<空白>之于<空白>"这类类比时,需要识别首对术语("氧气"与"气体")之间的语义关系(如"类型关系"),并找到具有相同关系的第二对术语(如"铝"与"金属")。本研究构建了一个包含1.5万个多选题的比例类比补全数据集,并在多种知识增强提示设置下评估当代大型语言模型的性能。具体而言,我们为提示语注入了三类知识:范例知识、结构化知识和定向知识。实验结果表明,尽管训练数据规模庞大,解决比例类比对当前大型语言模型仍具挑战性,最佳模型的准确率仅为55%。值得注意的是,相较于提供范例或结构化知识集合,定向知识能更有效地辅助模型完成比例类比。我们的代码与数据公开于:https://github.com/Thiliniiw/KnowledgePrompts/