Large Language Models (LLMs) show strong reasoning ability in open-domain question answering, yet their reasoning processes are typically linear and often logically inconsistent. In contrast, real-world reasoning requires integrating multiple premises and solving subproblems in parallel. Existing methods, such as Chain-of-Thought (CoT), express reasoning in a linear textual form, which may appear coherent but frequently leads to inconsistent conclusions. Recent approaches rely on externally provided graphs and do not explore how LLMs can construct and use their own graph-structured reasoning, particularly in open-domain QA. To fill this gap, we novelly explore graph-structured reasoning of LLMs in general-domain question answering. We propose Self-Graph Reasoning (SGR), a framework that enables LLMs to explicitly represent their reasoning process as a structured graph before producing the final answer. We further construct a graph-structured reasoning dataset that merges multiple candidate reasoning graphs into refined graph structures for model training. Experiments on five QA benchmarks across both general and specialized domains show that SGR consistently improves reasoning consistency and yields a 17.74% gain over the base model. The LLaMA-3.3-70B model fine-tuned with SGR performs comparably to GPT-4o and surpasses Claude-3.5-Haiku, demonstrating the effectiveness of graph-structured reasoning.
翻译:大语言模型在开放领域问答中展现出强大的推理能力,但其推理过程通常是线性的,且常存在逻辑不一致问题。相比之下,现实世界的推理需要整合多个前提并并行解决子问题。现有方法(如思维链)以线性文本形式表达推理,看似连贯却常导致不一致的结论。近期方法依赖外部提供的图结构,并未探索大语言模型如何构建并利用自身的图结构推理,尤其在开放领域问答中。为填补这一空白,本研究创新性地探索了通用领域问答中大语言模型的图结构推理。我们提出自图推理框架,使大语言模型在生成最终答案前,能将其推理过程显式表示为结构化图。我们进一步构建了图结构推理数据集,通过融合多个候选推理图生成精炼的图结构用于模型训练。在涵盖通用与专业领域的五个问答基准测试中,实验表明SGR持续提升推理一致性,相比基础模型获得17.74%的性能增益。经SGR微调的LLaMA-3.3-70B模型达到与GPT-4o相当的水平,并超越Claude-3.5-Haiku,证明了图结构推理的有效性。