Datasets for the experimental evaluation of knowledge graph refinement algorithms typically contain only ground facts, retaining very limited schema level knowledge even when such information is available in the source knowledge graphs. This limits the evaluation of methods that rely on rich ontological constraints, reasoning or neurosymbolic techniques and ultimately prevents assessing their performance in large-scale, real-world knowledge graphs. In this paper, we present \resource{} the first resource that provides a workflow for extracting datasets including both schema and ground facts, ready for machine learning and reasoning services, along with the resulting curated suite of datasets. The workflow also handles inconsistencies detected when keeping both schema and facts and also leverage reasoning for entailing implicit knowledge. The suite includes newly extracted datasets from KGs with expressive schemas while simultaneously enriching existing datasets with schema information. Each dataset is serialized in OWL making it ready for reasoning services. Moreover, we provide utilities for loading datasets in tensor representations typical of standard machine learning libraries.
翻译:知识图谱精化算法的实验评估数据集通常仅包含事实性知识,即使源知识图谱中存在模式层面的信息,也仅保留极其有限的模式知识。这限制了依赖丰富本体约束、推理或神经符号技术的方法评估,并最终阻碍了在大规模真实世界知识图谱中评估其性能。本文提出\resource{}——首个提供包含模式与事实的完整数据集提取流程的资源,该流程同时产出经过系统整理的成套数据集,可直接用于机器学习与推理服务。该工作流能够处理同时保留模式与事实时检测到的不一致性问题,并利用推理机制推导隐含知识。该数据集套件既包含从具有丰富表达模式的知识图谱中新提取的数据集,同时也为现有数据集补充了模式信息。每个数据集均以OWL格式序列化,可直接用于推理服务。此外,我们提供了将数据集加载为典型机器学习库所需张量表示的工具集。