Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP).
翻译:基于链接遍历的查询处理(ltqp)通过评估包含文档网络的SPARQL查询(而非单一数据集)来实现交互,常被视为理论有趣但实践受限的技术。然而在数据超中心化日益受到质疑的当下,基于简单文档接口的分散式数据网络具有显著优势——它使数据发布者能够自主控制数据及其访问权限。虽然ltqp支持对此类网络进行复杂查询评估,但存在因数据文档数量庞大导致的性能问题,以及因多源文档供给引发的信息质量问题。现有ltqp方法中,寻找查询源的负担完全落在数据消费者方。本文提出应赋予数据发布者推荐关注源、引导数据消费者获取相关可信数据的能力以解决上述问题。我们建立了实现这种引导式链接遍历的理论框架并研究其特性,通过理论示例证明该方法能优化查询结果并减少网络请求次数。在虚拟链接网络上的实验评估表明,该方法不仅提升了数据质量,还显著提高了查询效率。本文正在考虑发表于《逻辑编程理论与实践》(TPLP)。