Reasoning is a fundamental component for achieving language understanding. Among the multiple types of reasoning, conditional reasoning, the ability to draw different conclusions depending on some condition, has been understudied in large language models (LLMs). Recent prompting methods, such as chain of thought, have significantly improved LLMs on reasoning tasks. Nevertheless, there is still little understanding of what triggers reasoning abilities in LLMs. We hypothesize that code prompts can trigger conditional reasoning in LLMs trained on text and code. We propose a chain of prompts that transforms a natural language problem into code and prompts the LLM with the generated code. Our experiments find that code prompts exhibit a performance boost between 2.6 and 7.7 points on GPT 3.5 across multiple datasets requiring conditional reasoning. We then conduct experiments to discover how code prompts elicit conditional reasoning abilities and through which features. We observe that prompts need to contain natural language text accompanied by high-quality code that closely represents the semantics of the instance text. Furthermore, we show that code prompts are more efficient, requiring fewer demonstrations, and that they trigger superior state tracking of variables or key entities.
翻译:推理是实现语言理解的基本组成部分。在多种推理类型中,条件推理(即根据某些条件得出不同结论的能力)在大语言模型(LLMs)中尚未得到充分研究。近年来,链式思维等提示方法显著提升了LLMs在推理任务上的表现。然而,目前对LLMs推理能力触发机制的理解仍然有限。我们假设,代码提示能够激发在文本和代码上训练的LLMs的条件推理能力。我们提出了一种提示链,将自然语言问题转化为代码,并使用生成的代码对LLM进行提示。实验发现,在需要条件推理的多个数据集上,代码提示使GPT 3.5的性能提升2.6至7.7个百分点。随后,我们通过实验探究代码提示如何激发条件推理能力,以及通过哪些特征实现。我们观察到,提示需包含自然语言文本,并辅以高质量、能准确表示实例文本语义的代码。此外,我们证明代码提示效率更高,所需示例更少,并能实现对变量或关键实体状态的更优追踪。