In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. While current approaches leverage formal languages as intermediate representation of reasoning problems, they struggle with generating intermediate formal specifications and refining these representations. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM. It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and LLM based techniques on natural language reasoning tasks on two datasets, FOLIO and AR-LSAT. Logic-LM++ show an average improvement of 13.5% on standard prompting, 11% on chain of thought prompting and 5% on Logic-LM.
翻译:本文探讨了大语言模型(LLMs)在复杂推理任务中的局限性。尽管现有方法利用形式化语言作为推理问题的中间表示,但它们在生成中间形式化规约以及精炼这些表示方面仍存在困难。为解决这些问题,本文提出了Logic-LM++,这是对Logic-LM的改进。它利用LLMs进行成对比较的能力,从而能够评估LLM所提出的精炼方案。论文表明,在FOLIO和AR-LSAT两个数据集上的自然语言推理任务中,Logic-LM++的表现优于Logic-LM以及基于LLM的技术。Logic-LM++在标准提示方法上平均提升了13.5%,在思维链提示方法上提升了11%,在Logic-LM上提升了5%。