Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning.
翻译:大型语言模型(LLMs)能够通过思维链(CoT)推理回答知识密集型复杂问题,但当所需知识在模型参数中不可用或未及时更新时,它们倾向于生成事实上错误的推理步骤。近期研究转向检索外部知识以增强CoT推理。然而,这些基于链的方法虽具潜力,但仍存在以下问题:1)负向检索。不必要或错误的检索可能误导推理过程;2)视野受限。缺乏回溯或前瞻能力,单个步骤的局部错误将沿链传播。本文提出一种新方法:概率树状思维推理(ProbTree)。首先,LLMs将复杂问题转化为查询树,其中每个非根节点代表其父节点的子问题。随后,在树上进行概率推理,通过结合问题分解与回答的置信度,从叶节点到根节点依次求解问题。在推理过程中,对于叶节点,LLMs从使用参数化知识的闭卷问答与使用检索外部知识的开卷问答中选择置信度更高的答案,从而消除负向检索问题。对于非叶节点,借助层次结构,LLMs拥有更广阔的视野,能够利用子节点信息进行全局推理,从而从局部错误中恢复。在开放域设置下的三个复杂问答数据集上的实验表明,我们的方法显著优于现有最先进方法,验证了概率树状思维推理的有效性。