Chain-of-Thought(CoT) prompting and its variants explore equipping large language models (LLMs) with high-level reasoning abilities by emulating human-like linear cognition and logic. However, the human mind is complicated and mixed with both linear and nonlinear thinking. In this work, we propose \textbf{I}nferential \textbf{E}xclusion \textbf{P}rompting (IEP), a novel prompting that combines the principles of elimination and inference in order to guide LLMs to think non-linearly. IEP guides LLMs to plan and then utilize Natural Language Inference (NLI) to deduce each possible solution's entailment relation with context, commonsense, or facts, therefore yielding a broader perspective by thinking back for inferring. This forward planning and backward eliminating process allows IEP to better simulate the complex human thinking processes compared to other CoT-based methods, which only reflect linear cognitive processes. We conducted a series of empirical studies and have corroborated that IEP consistently outperforms CoT across various tasks. Additionally, we observe that integrating IEP and CoT further improves the LLMs' performance on certain tasks, highlighting the necessity of equipping LLMs with mixed logic processes. Moreover, to better evaluate comprehensive features inherent in human logic, we introduce \textbf{M}ental-\textbf{A}bility \textbf{R}easoning \textbf{B}enchmark (MARB). The benchmark comprises six novel subtasks with a total of 9,115 questions, among which 1,685 are developed with hand-crafted rationale references. We believe both \textsc{IEP} and \textsc{MARB} can serve as a promising direction for unveiling LLMs' logic and verbal reasoning abilities and drive further advancements. \textsc{MARB} will be available at ~\texttt{anonymity link} soon.
翻译:链式思维提示及其变体通过模拟人类线性认知与逻辑,探索赋予大语言模型高级推理能力。然而,人类思维是复杂且混合了线性与非线性思维的。本文提出**推理排除提示**(IEP),这是一种结合排除与推理原则的新颖提示方法,旨在引导大语言模型进行非线性思考。IEP引导模型进行规划,随后利用自然语言推理(NLI)推导每个可能解与上下文、常识或事实之间的蕴涵关系,从而通过回溯推理获得更广阔的视角。这种前向规划与后向排除的过程使IEP相较于仅反映线性认知过程的其他基于链式思维的方法,能更好地模拟复杂的人类思维过程。我们开展了一系列实证研究,证实IEP在多种任务上始终优于链式思维。此外,我们发现将IEP与链式思维相结合可进一步提升大语言模型在部分任务上的性能,这凸显了为大语言模型配备混合逻辑过程的必要性。同时,为更全面评估人类逻辑中固有的综合特征,我们引入了**心智能力推理基准**(MARB)。该基准包含六个新颖的子任务,共有9115道题目,其中1685道配有手工制作的推理依据。我们相信IEP与MARB可为揭示大语言模型的逻辑与语言推理能力提供有前景的方向,并推动进一步发展。MARB将很快在匿名链接上开放。