To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.
翻译:为了实现符合人类期望的忠实推理,大型语言模型(LLMs)需要将其推理过程与现实世界知识(如网络事实、数学与物理规则)相结合。工具能帮助LLMs获取此类外部知识,但在微调LLM智能体(如Toolformer)以处理多步推理问题时仍存在挑战——此类问题中相互关联的工具调用需要整体且高效的工具使用规划。本文提出一种新方法,使LLMs能在多步推理中更好地利用工具。我们的方法——抽象链推理(Chain-of-Abstraction, CoA)——训练LLMs首先解码带有抽象占位符的推理链,随后调用领域工具通过填充具体知识来实现每条推理链。这种基于抽象链的规划使LLMs能学习更通用的推理策略,这些策略对与不同推理问题相关的领域知识变化(如数学结果)具有鲁棒性。同时,该方法允许LLMs并行执行解码与外部工具调用,避免了等待工具响应导致的推理延迟。在数学推理与维基百科问答领域,我们证明该方法在分布内与分布外测试集上均持续优于以往的思维链与工具增强基线方法,平均精确问答(QA)准确率提升约6%。经本方法训练的LLM智能体还展现出更高效的工具使用能力,其推理速度平均比基线工具增强型LLM快约1.4倍。