Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs

Open-world Question Answering (OW-QA) over knowledge graphs (KGs) aims to answer questions over incomplete or evolving KGs. Traditional KGQA assumes a closed world where answers must exist in the KG, limiting real-world applicability. In contrast, open-world QA requires inferring missing knowledge based on graph structure and context. Large language models (LLMs) excel at language understanding but lack structured reasoning. Graph neural networks (GNNs) model graph topology but struggle with semantic interpretation. Existing systems integrate LLMs with GNNs or graph retrievers. Some support open-world QA but rely on structural embeddings without semantic grounding. Most assume observed paths or complete graphs, making them unreliable under missing links or multi-hop reasoning. We present GLOW, a hybrid system that combines a pre-trained GNN and an LLM for open-world KGQA. The GNN predicts top-k candidate answers from the graph structure. These, along with relevant KG facts, are serialized into a structured prompt (e.g., triples and candidates) to guide the LLM's reasoning. This enables joint reasoning over symbolic and semantic signals, without relying on retrieval or fine-tuning. To evaluate generalization, we introduce GLOW-BENCH, a 1,000-question benchmark over incomplete KGs across diverse domains. GLOW outperforms existing LLM-GNN systems on standard benchmarks and GLOW-BENCH, achieving up to 53.3% and an average 38% improvement. GitHub code and data are available.

翻译：面向知识图谱（KGs）的开放世界问答（OW-QA）旨在对不完整或动态演化的知识图谱进行问答。传统知识图谱问答假设知识图谱为封闭世界，即答案必须存在于知识图谱中，这限制了其在现实场景中的适用性。相比之下，开放世界问答需要基于图结构和上下文信息推断缺失知识。大语言模型（LLMs）擅长语言理解，但缺乏结构化推理能力；图神经网络（GNNs）擅长建模图拓扑结构，但在语义理解方面存在不足。现有系统将LLMs与GNNs或图检索器集成，部分系统虽支持开放世界问答，但依赖缺乏语义基础的结构化嵌入，且多数假设观测路径或图结构完整，导致在缺失链接或多跳推理场景下不可靠。我们提出GLOW——一种结合预训练GNN与LLM的混合系统，用于开放世界知识图谱问答。该系统通过GNN从图结构中预测最优k个候选答案，并将这些答案及相关知识图谱事实序列化为结构化提示（如三元组和候选集），引导LLM进行推理。该方法无需检索或微调即可实现符号信号与语义信号的联合推理。为评估泛化能力，我们构建了跨领域不完整知识图谱的GLOW-BENCH基准测试集（含1000个问题）。实验表明，GLOW在标准基准测试和GLOW-BENCH上均优于现有LLM-GNN系统，最高提升53.3%，平均提升38%。相关代码与数据集已在GitHub开源。