Analogy-making is central to human cognition, allowing us to adapt to novel situations -- an ability that current AI systems still lack. Most analogy datasets today focus on simple analogies (e.g., word analogies); datasets including complex types of analogies are typically manually curated and very small. We believe that this holds back progress in computational analogy. In this work, we design a data generation pipeline, ParallelPARC (Parallel Paragraph Creator) leveraging state-of-the-art Large Language Models (LLMs) to create complex, paragraph-based analogies, as well as distractors, both simple and challenging. We demonstrate our pipeline and create ProPara-Logy, a dataset of analogies between scientific processes. We publish a gold-set, validated by humans, and a silver-set, generated automatically. We test LLMs' and humans' analogy recognition in binary and multiple-choice settings, and found that humans outperform the best models (~13% gap) after a light supervision. We demonstrate that our silver-set is useful for training models. Lastly, we show challenging distractors confuse LLMs, but not humans. We hope our pipeline will encourage research in this emerging field.
翻译:类比推理是人类认知的核心能力,使我们能够适应新情境——这一能力目前的人工智能系统仍不具备。当今大多数类比数据集仅关注简单类比(如单词类比),而包含复杂类比类型的数据集通常依赖人工标注且规模极小。我们认为这阻碍了计算类比领域的发展。本研究设计了一种数据生成流程ParallelPARC(并行段落生成器),利用最先进的大型语言模型(LLMs)创建基于段落的复杂类比及其干扰项(包括简单与高难度两类)。我们通过该流程构建了ProPara-Logy数据集,其中包含科学过程间的类比关系。我们发布了经人工验证的金标准集(gold-set)与自动生成的银标准集(silver-set)。在二选一与多项选择场景下测试了LLM与人类的类比识别能力,发现人类在接受简单监督后仍显著优于最优模型(差距约13%)。实验表明,银标准集对模型训练具有实用价值。最后,我们证实高难度干扰项能混淆LLM但无法影响人类。希望该流程能推动这一新兴领域的研究发展。