Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks but their performance in complex logical reasoning tasks remains unsatisfactory. Although some prompting methods, such as Chain-of-Thought, can improve the reasoning ability of LLMs to some extent, they suffer from an unfaithful issue where derived conclusions may not align with the generated reasoning chain. To address this issue, some studies employ the approach of propositional logic to further enhance logical reasoning abilities of LLMs. However, the potential omissions in the extraction of logical expressions in these methods can cause information loss in the logical reasoning process, thereby generating incorrect results. To this end, we propose Logic-of-Thought (LoT) prompting which employs propositional logic to generate expanded logical information from input context, and utilizes the generated logical information as an additional augmentation to the input prompts, thereby enhancing the capability of logical reasoning. The LoT is orthogonal to existing prompting methods and can be seamlessly integrated with them. Extensive experiments demonstrate that LoT boosts the performance of various prompting methods with a striking margin across five logical reasoning tasks. In particular, the LoT enhances Chain-of-Thought's performance on the ReClor dataset by +4.35%; moreover, it improves Chain-of-Thought with Self-Consistency's performance on LogiQA by +5%; additionally, it boosts performance of Tree-of-Thoughts on ProofWriter dataset by +8%.
翻译:大型语言模型(LLMs)已在多种任务中展现出卓越能力,但在复杂逻辑推理任务中的表现仍不尽如人意。尽管部分提示方法(如思维链)能在一定程度上提升LLMs的推理能力,但其存在推理不忠实的问题——推导出的结论可能与生成的推理链条不一致。为解决此问题,已有研究采用命题逻辑方法进一步增强LLMs的逻辑推理能力。然而,这些方法在逻辑表达式提取过程中可能存在遗漏,导致逻辑推理过程中的信息损失,从而产生错误结果。为此,我们提出思维逻辑(LoT)提示方法:该方法运用命题逻辑从输入上下文中生成扩展的逻辑信息,并将生成逻辑信息作为输入提示的额外增强,从而提升逻辑推理能力。LoT与现有提示方法正交,可与之无缝集成。大量实验表明,LoT在五项逻辑推理任务中均能显著提升各类提示方法的性能。具体而言:LoT将思维链在ReClor数据集上的性能提升+4.35%;将带自洽性的思维链在LogiQA数据集上的性能提升+5%;将思维树在ProofWriter数据集上的性能提升+8%。