Large Language Models (LLMs) excel at language understanding but remain limited in knowledge-intensive domains due to hallucinations, outdated information, and limited explainability. Text-based retrieval-augmented generation (RAG) helps ground model outputs in external sources but struggles with multi-hop reasoning. Knowledge Graphs (KGs), in contrast, support precise, explainable querying, yet require a knowledge of query languages. This work introduces an interactive framework in which LLMs generate and explain Cypher graph queries and users iteratively refine them through natural language. Applied to real-world KGs, the framework improves accessibility to complex datasets while preserving factual accuracy and semantic rigor and provides insight into how model performance varies across domains. Our core quantitative evaluation is a 90-query benchmark on a synthetic movie KG that measures query explanation quality and fault detection across multiple LLMs, complemented by two smaller real-life query-generation experiments on a Hyena KG and the MaRDI (Mathematical Research Data Initiative) KG.
翻译:大型语言模型(LLM)在语言理解方面表现出色,但由于存在幻觉、信息过时和可解释性有限等问题,在知识密集型领域仍存在局限。基于文本的检索增强生成(RAG)有助于将模型输出建立在外部知识源上,但在处理多跳推理时面临困难。相比之下,知识图谱(KG)支持精确、可解释的查询,但要求使用者掌握查询语言知识。本研究提出了一种交互式框架:LLM负责生成并解释Cypher图查询语句,用户则通过自然语言对其进行迭代优化。该框架应用于真实世界知识图谱时,在保持事实准确性和语义严谨性的同时,提升了复杂数据集的访问便利性,并揭示了模型在不同领域的性能差异。我们的核心量化评估基于包含90个查询的合成电影知识图谱基准测试,用于衡量不同LLM的查询解释质量与错误检测能力,并辅以在Hyena知识图谱和MaRDI(数学研究数据倡议)知识图谱上进行的两个小型现实场景查询生成实验。