Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP).
翻译:基于链接遍历的查询处理(Link Traversal-based Query Processing, ltqp)通过在文档网络而非单一数据集上执行SPARQL查询,常被视为一项理论上有趣但实用性不足的技术。然而,在数据高度集中化日益受到质疑的当下,采用简单文档接口的分散式数据网络颇具吸引力,因为它能使数据发布者掌控数据及其访问权限。尽管ltqp能在此类网络上评估复杂查询,但该技术面临性能问题(由于包含数据的高文档数量)以及信息质量隐患(由于提供这些文档的来源众多)。在现有ltqp方法中,查找查询来源的负担完全落在数据消费者身上。本文认为,为解决这些问题,数据发布者亦应能够建议感兴趣的数据来源,引导数据消费者获取相关且可信的数据。我们提出了一套可实现此类引导式链接遍历的理论框架,并研究了其性质。通过理论示例说明,该方法能改善查询结果并减少网络请求次数。我们在一个包含规范的虚拟链接网络上通过实验评估了所提方案,实际观察到不仅数据质量得到提升,查询效率也有所改善。本文正在《逻辑编程理论与实践》(Theory and Practice of Logic Programming, TPLP)审稿中。