We present a Wikidata-based framework, called KIF, for virtually integrating heterogeneous knowledge sources. KIF is written in Python and is released as open-source. It leverages Wikidata's data model and vocabulary plus user-defined mappings to construct a unified view of the underlying sources while keeping track of the context and provenance of their statements. The underlying sources can be triplestores, relational databases, CSV files, etc., which may or may not use the vocabulary and RDF encoding of Wikidata. The end result is a virtual knowledge base which behaves like an "extended Wikidata" and which can be queried using a simple but expressive pattern language, defined in terms of Wikidata's data model. In this paper, we present the design and implementation of KIF, discuss how we have used it to solve a real integration problem in the domain of chemistry (involving Wikidata, PubChem, and IBM CIRCA), and present experimental results on the performance and overhead of KIF
翻译:本文提出一种基于Wikidata的虚拟集成框架KIF,用于整合异构知识源。KIF采用Python编写并已开源发布。该框架通过利用Wikidata的数据模型、词汇表及用户自定义映射,在追踪原始陈述的上下文与溯源信息的同时,构建底层知识源的统一视图。底层知识源可以是三元组存储库、关系型数据库、CSV文件等形式,无论其是否采用Wikidata的词汇表与RDF编码方式。最终生成一个表现为“扩展版Wikidata”的虚拟知识库,支持通过基于Wikidata数据模型定义的简洁而富有表达力的模式语言进行查询。本文详细阐述KIF的设计与实现,探讨如何运用该框架解决化学领域实际集成问题(涉及Wikidata、PubChem与IBM CIRCA),并呈现KIF在性能与开销方面的实验结果。