Large Language Models (LLMs) are increasingly used for question answering over scientific research papers. Existing retrieval augmentation methods often rely on isolated text chunks or concepts, but overlook deeper semantic connections between papers. This impairs the LLM's comprehension of scientific literature, hindering the comprehensiveness and specificity of its responses. To address this, we propose Central Entity-Guided Graph Optimization for Community Detection (CE-GOCD), a method that augments LLMs' scientific question answering by explicitly modeling and leveraging semantic substructures within academic knowledge graphs. Our approach operates by: (1) leveraging paper titles as central entities for targeted subgraph retrieval, (2) enhancing implicit semantic discovery via subgraph pruning and completion, and (3) applying community detection to distill coherent paper groups with shared themes. We evaluated the proposed method on three NLP literature-based question-answering datasets, and the results demonstrate its superiority over other retrieval-augmented baseline approaches, confirming the effectiveness of our framework.
翻译:大语言模型(LLMs)在科学研究论文问答任务中的应用日益广泛。现有的检索增强方法通常依赖于孤立的文本片段或概念,却忽视了论文之间更深层的语义关联。这限制了大语言模型对科学文献的理解能力,从而影响了其回答的全面性与准确性。为解决这一问题,我们提出了一种基于中心实体引导的图优化社区检测方法(CE-GOCD),该方法通过显式建模并利用学术知识图谱中的语义子结构,以增强大语言模型的科学问答性能。我们的方法具体包括:(1)利用论文标题作为中心实体进行定向子图检索;(2)通过子图剪枝与补全增强隐含语义的发现;(3)应用社区检测技术提炼出具有共同主题的连贯论文群组。我们在三个基于NLP文献的问答数据集上对所提方法进行了评估,结果表明其性能优于其他检索增强基线方法,验证了本框架的有效性。