Large Language Models (LLMs) provide flexible natural language processing capabilities, while knowledge graphs (KGs) offer explicit and structured knowledge. Integrating these two in a complementary manner enables the development of reliable and verifiable AI systems. In particular, knowledge graph question answering (KGQA) has attracted attention as a means to reduce LLM hallucinations and to leverage knowledge beyond the training data. However, existing KGQA benchmark datasets are biased toward encyclopedic knowledge, limited to a single modality, and lack fine-grained spatiotemporal data, which limits their applicability to real-world scenarios targeted by Embodied AI. We introduce HOME-KGQA, a novel KGQA benchmark dataset built on a multimodal KG of daily household activities. HOME-KGQA consists of complex, multi-hop natural language questions paired with graph database query languages. Compared to existing benchmarks, it includes more challenging questions that involve multi-level spatiotemporal reasoning, multimodal grounding, and aggregate functions. Experimental results show that the LLM-based KGQA methods fail to achieve performance comparable to that on existing datasets when evaluated on HOME-KGQA. This highlights significant challenges that should be addressed for the real-world deployment of KGQA systems. Our dataset is available at https://github.com/aistairc/home-kgqa
翻译:大语言模型(LLMs)提供了灵活的自然语言处理能力,而知识图谱(KGs)则提供了明确且结构化的知识。将两者以互补方式整合,能够开发出可靠且可验证的人工智能系统。特别是,知识图谱问答(KGQA)作为减少LLM幻觉并利用训练数据之外知识的手段,已引起广泛关注。然而,现有KGQA基准数据集偏向于百科全书式知识,局限于单一模态,且缺乏细粒度的时空数据,这限制了其在具身智能所针对的真实场景中的适用性。我们提出了HOME-KGQA,这是一个基于家庭日常活动多模态知识图谱构建的新型KGQA基准数据集。HOME-KGQA包含复杂的、多跳自然语言问题及其对应的图数据库查询语言。与现有基准相比,它包含了涉及多层次时空推理、多模态对齐和聚合函数等更具挑战性的问题。实验结果表明,在HOME-KGQA上评估时,基于LLM的KGQA方法未能达到在现有数据集上的可比性能。这凸显了在KGQA系统实际部署中应解决的重大挑战。我们的数据集发布于https://github.com/aistairc/home-kgqa。