Recent advances in large language models (LLMs) have significantly impacted the domain of multi-hop question answering (MHQA), where systems are required to aggregate information and infer answers from disparate pieces of text. However, the autoregressive nature of LLMs inherently poses a challenge as errors may accumulate if mistakes are made in the intermediate reasoning steps. This paper introduces Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) to identify optimal reasoning paths in MHQA tasks, mitigating the error propagation from sequential reasoning processes. Unlike previous works, we propose a zero-shot prompting method, which relies solely on instructions without the support of hand-crafted few-shot examples that typically require domain expertise. We also introduce a behavioral cloning approach (MZQA-BC) trained on self-generated MCTS inference trajectories, achieving an over 10-fold increase in reasoning speed with bare compromise in performance. The efficacy of our method is validated on standard benchmarks such as HotpotQA, 2WikiMultihopQA, and MuSiQue, demonstrating that it outperforms existing frameworks.
翻译:大型语言模型(LLM)的最新进展对多跳问答(MHQA)领域产生了显著影响,该系统需要从分散的文本片段中聚合信息并推断答案。然而,LLM的自回归特性本质上带来了挑战,因为如果在中间推理步骤中出现错误,错误可能会累积。本文提出了用于零样本多跳问答的蒙特卡洛树搜索(MZQA),这是一个基于蒙特卡洛树搜索(MCTS)的框架,旨在识别MHQA任务中的最优推理路径,从而减轻顺序推理过程中的错误传播。与先前工作不同,我们提出了一种零样本提示方法,该方法仅依赖于指令,无需通常需要领域专业知识的手工构建的少样本示例支持。我们还引入了一种行为克隆方法(MZQA-BC),该方法在自生成的MCTS推理轨迹上进行训练,在性能几乎不受影响的情况下实现了超过10倍的推理速度提升。我们的方法在HotpotQA、2WikiMultihopQA和MuSiQue等标准基准测试上得到了验证,结果表明其性能优于现有框架。