Fact-seeking question answering with large language models (LLMs) remains unreliable when answers depend on up-to-date or conflicting information. Although retrieval-augmented and tool-using LLMs reduce hallucinations, they often rely on implicit planning, leading to inefficient tool usage. We propose a modular framework that explicitly separates planning from factual retrieval and answer synthesis. A lightweight student planner is trained via a teacher-student framework to generate structured decompositions consisting of abstract reasoning steps and searchable fact requests. The supervision signals contain only planning traces and fact requests, without providing factual answers or retrieved evidence. At inference, the planner produces plans, while prompt-engineered modules perform retrieval and response synthesis. We evaluate the proposed framework on SEAL-0, an extremely challenging benchmark for search-augmented LLMs. Results show that supervised planning improves both accuracy and latency compared to monolithic reasoning models and prompt-based tool-augmented frameworks, demonstrating that explicitly learned planning structures are essential for reliable fact-seeking LLMs.
翻译:基于大语言模型(LLMs)的事实性问答在答案依赖实时或冲突信息时仍不可靠。尽管检索增强型和工具调用型LLMs减少了幻觉问题,但它们通常依赖隐式规划,导致工具使用效率低下。我们提出一种模块化框架,将规划过程与事实检索及答案合成进行显式分离。通过师生框架训练一个轻量级学生规划器,使其生成由抽象推理步骤与可检索事实请求组成的结构化分解方案。监督信号仅包含规划轨迹和事实请求,而不提供事实答案或检索证据。在推理阶段,规划器生成计划,而经过提示工程设计的模块执行检索与响应合成。我们在SEAL-0(一个针对检索增强型LLMs的极端挑战性基准测试)上评估所提框架。结果表明:与单体推理模型及基于提示的工具增强框架相比,监督式规划在准确性和延迟方面均有提升,这证明显式学习的规划结构对于构建可靠的事实性问答LLMs至关重要。