Data integration is the primary use case for knowledge graphs. However, integrated data are not typically graphs but come in different formats, for example, CSV, XML, or a relational database. Façade-X is a recently proposed method for providing direct access to an open-ended set of data formats. The method includes a meta-model that specialises RDF to fit general data structures. This model allows to express SPARQL queries targeting data sources with those structures. Previous work formalised Façade-X and demonstrated how it can theoretically represent any format expressible with a context-free grammar, as well as the relational model. A reference implementation, SPARQL Anything, demonstrates the feasibility of the approach in practice. It is noteworthy that Façade-X utilises a fraction of RDF, and, consequently, not all SPARQL queries yield a solution (i.e. are satisfiable) when evaluated over a Façade-X graph. In this article, we consolidate Façade-X, and we study the satisfiability of basic graph patterns. The theory is accompanied by an algorithm for deciding the satisfiability of basic graph patterns on Façade-X data sources. Furthermore, we provide extensive experiments with a proof-of-concept implementation, demonstrating practical feasibility, including with real-world queries. Our results pave the way for studying query execution strategies for Façade-X data access with SPARQL and supporting developers to build more efficient data integration systems for knowledge graphs.
翻译:数据集成是知识图谱的主要应用场景。然而,集成的数据通常并非图结构,而是以多种格式存在,例如CSV、XML或关系数据库。Façade-X是近期提出的一种方法,旨在为开放的数据格式集合提供直接访问能力。该方法包含一个专门针对通用数据结构定制RDF的元模型。该模型允许针对具有此类结构的数据源编写SPARQL查询。先前的研究形式化定义了Façade-X,并论证了其理论上如何表示任何可通过上下文无关文法表达的数据格式及关系模型。参考实现SPARQL Anything验证了该方法的实践可行性。值得注意的是,Façade-X仅使用RDF的一个子集,因此当在Façade-X图上执行时,并非所有SPARQL查询都能获得解(即具有可满足性)。本文系统整合了Façade-X框架,并深入研究了基本图模式的可满足性问题。理论分析辅以一个判定Façade-X数据源上基本图模式可满足性的算法。此外,我们通过概念验证实现进行了大量实验,证明了该方法的实际可行性,包括对真实世界查询的处理。研究成果为探索基于SPARQL的Façade-X数据访问查询执行策略奠定了基础,并有助于开发者构建更高效的知识图谱数据集成系统。