Multi-hop question answering (QA) requires systems to iteratively retrieve evidence and reason across multiple hops. While recent RAG and agentic methods report strong results, the underlying retrieval--reasoning \emph{process} is often left implicit, making procedural choices hard to compare across model families. This survey takes the execution procedure as the unit of analysis and introduces a four-axis framework covering (A) overall execution plan, (B) index structure, (C) next-step control (strategies and triggers), and (D) stop/continue criteria. Using this schema, we map representative multi-hop QA systems and synthesize reported ablations and tendencies on standard benchmarks (e.g., HotpotQA, 2WikiMultiHopQA, MuSiQue), highlighting recurring trade-offs among effectiveness, efficiency, and evidence faithfulness. We conclude with open challenges for retrieval--reasoning agents, including structure-aware planning, transferable control policies, and robust stopping under distribution shift.
翻译:多跳问答(QA)要求系统迭代地检索证据并进行跨多跳的推理。尽管近期的检索增强生成(RAG)与智能体方法报告了强劲的结果,但其底层的检索-推理**过程**往往隐而不显,导致不同模型族之间的流程选择难以比较。本综述以执行过程为分析单元,提出了一个四轴框架,涵盖(A)整体执行计划、(B)索引结构、(C)下一步控制(策略与触发机制)以及(D)停止/继续准则。基于此框架,我们对代表性的多跳问答系统进行了梳理,综合了在标准基准(如HotpotQA、2WikiMultiHopQA、MuSiQue)上报告的消融实验与趋势,揭示了在效果、效率与证据忠实度之间反复出现的权衡关系。最后,我们提出了检索-推理智能体面临的开放挑战,包括结构感知的规划、可迁移的控制策略,以及分布偏移下的鲁棒停止机制。