Large Language Models (LLMs) excel at language understanding but remain limited in knowledge-intensive domains due to hallucinations, outdated information, and limited explainability. Text-based retrieval-augmented generation (RAG) helps ground model outputs in external sources but struggles with multi-hop reasoning. Knowledge Graphs (KGs), in contrast, support precise, explainable querying, yet require a knowledge of query languages. This work introduces an interactive framework in which LLMs generate and explain Cypher graph queries and users iteratively refine them through natural language. Applied to real-world KGs, the framework improves accessibility to complex datasets while preserving factual accuracy and semantic rigor and provides insight into how model performance varies across domains. Our core quantitative evaluation is a 90-query benchmark on a synthetic movie KG that measures query explanation quality and fault detection across multiple LLMs, complemented by two smaller real-life query-generation experiments on a Hyena KG and the MaRDI (Mathematical Research Data Initiative) KG.
翻译:大型语言模型(LLM)在语言理解方面表现出色,但由于存在幻觉、信息过时及可解释性有限等问题,在知识密集型领域仍存在局限。基于文本的检索增强生成(RAG)技术有助于将模型输出锚定于外部知识源,但在处理多跳推理任务时面临困难。相比之下,知识图谱(KG)支持精确且可解释的查询,但要求使用者掌握查询语言知识。本研究提出一种交互式框架:在该框架中,LLM负责生成并解释Cypher图查询语句,用户则通过自然语言对其进行迭代优化。该框架应用于真实世界知识图谱时,在保持事实准确性与语义严谨性的同时,提升了复杂数据集的访问便利性,并揭示了模型性能在不同领域的差异规律。我们的核心量化评估基于包含90个查询的合成电影知识图谱基准测试,该测试衡量了多种LLM的查询解释质量与错误检测能力;同时辅以两个小型真实场景查询生成实验,分别基于Hyena知识图谱和MaRDI(数学研究数据倡议)知识图谱进行验证。