We present a comprehensive benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT), with a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing ChatGPT to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.
翻译:我们提出了一个面向材料科学知识图谱问答(KGQA4MAT)的综合基准数据集,重点关注金属-有机框架(MOFs)。通过整合结构化数据库和从文献中提取的知识,我们构建了金属-有机框架知识图谱(MOF-KG)。为了增强领域专家对MOF-KG的可访问性,我们旨在开发一种用于查询知识图谱的自然语言接口。我们构建了一个包含161个复杂问题的基准数据集,这些问题涉及比较、聚合和复杂的图结构。每个问题以三种额外变体形式改写,共得到644个问题及161个知识图谱查询。为评估该基准,我们开发了一种系统方法,利用ChatGPT将自然语言问题转化为形式化知识图谱查询。我们还将该方法应用于著名的QALD-9数据集,展示了ChatGPT在应对不同平台和查询语言的KGQA问题方面的潜力。该基准及所提出的方法旨在促进面向特定领域材料科学知识图谱的用户友好型高效接口的进一步研究与开发,从而加速新型材料的发现。