MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models

Inductive reasoning is an essential capability for large language models (LLMs) to achieve higher intelligence, which requires the model to generalize rules from observed facts and then apply them to unseen examples. We present MIRAGE, a synthetic dataset that addresses the limitations of previous work, specifically the lack of comprehensive evaluation and flexible test data. In it, we evaluate LLMs' capabilities in both the inductive and deductive stages, allowing for flexible variation in input distribution, task scenario, and task difficulty to analyze the factors influencing LLMs' inductive reasoning. Based on these multi-faceted evaluations, we demonstrate that the LLM is a poor rule-based reasoner. In many cases, when conducting inductive reasoning, they do not rely on a correct rule to answer the unseen case. From the perspectives of different prompting methods, observation numbers, and task forms, models tend to consistently conduct correct deduction without correct inductive rules. Besides, we find that LLMs are good neighbor-based reasoners. In the inductive reasoning process, the model tends to focus on observed facts that are close to the current test example in feature space. By leveraging these similar examples, the model maintains strong inductive capabilities within a localized region, significantly improving its deductive performance.

翻译：归纳推理是大型语言模型（LLM）实现更高智能的关键能力，要求模型从观察到的事实中归纳出规则，并将其应用于未见过的示例。本文提出MIRAGE——一个合成数据集，旨在解决以往研究的局限性，特别是评估不够全面和测试数据缺乏灵活性的问题。通过该数据集，我们评估了LLM在归纳与演绎两个阶段的能力，并允许对输入分布、任务场景和任务难度进行灵活调整，以分析影响LLM归纳推理的因素。基于这些多维度评估，我们证明LLM是一种较差的基于规则的推理器。在许多情况下，当进行归纳推理时，它们并未依赖正确的规则来回答未见案例。从不同提示方法、观察数量及任务形式的角度来看，模型倾向于在不依赖正确归纳规则的情况下持续进行正确演绎。此外，我们发现LLM是优秀的基于邻近样本的推理器。在归纳推理过程中，模型倾向于关注特征空间中与当前测试示例相近的已观察事实。通过利用这些相似示例，模型在局部区域内保持强大的归纳能力，从而显著提升其演绎表现。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日