Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples or sequential transformations. However, collecting large-scale and diverse human-generated inductive data is challenging. We focus on data synthesis in the code domain and propose a \textbf{Case2Code} task by exploiting the expressiveness and correctness of programs. Specifically, we collect a diverse set of executable programs, synthesize input-output transformations for each program, and force LLMs to infer the underlying code implementations based on the synthetic I/O cases. We first evaluate representative LLMs on the synthesized Case2Code task and demonstrate that the Case-to-code induction is challenging for LLMs. Then, we synthesize large-scale Case2Code training samples to train LLMs to perform inductive reasoning. Experimental results show that such induction training benefits not only in distribution Case2Code performance but also enhances various coding abilities of trained LLMs, demonstrating the great potential of learning inductive reasoning via synthetic data.
翻译:复杂推理是大语言模型(LLM)所展现出的令人印象深刻的能力。大多数LLM擅长演绎推理,例如通过思维链提示或迭代式工具使用来逐步解决具有挑战性的任务。在本文中,我们希望专注于评估和教导LLM进行归纳推理,即LLM应通过观察示例或序列变换来推断潜在规则。然而,收集大规模且多样化的人工生成归纳数据具有挑战性。我们聚焦于代码领域的数据合成,并利用程序的可表达性和正确性提出了\textbf{Case2Code}任务。具体而言,我们收集了一组多样化的可执行程序,为每个程序合成输入-输出变换,并迫使LLM基于合成的I/O案例推断底层的代码实现。我们首先在合成的Case2Code任务上评估了代表性LLM,结果表明案例到代码的归纳对LLM而言具有挑战性。随后,我们合成了大规模的Case2Code训练样本来训练LLM执行归纳推理。实验结果表明,这种归纳训练不仅有利于提升分布内Case2Code性能,还增强了受训LLM的各种编码能力,证明了通过合成数据学习归纳推理的巨大潜力。