Distributed Subweb Specifications for Traversing the Web

Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP).

翻译：基于链接遍历的查询处理（ltqp）通过评估包含文档网络的SPARQL查询（而非单一数据集）来实现交互，常被视为理论有趣但实践受限的技术。然而在数据超中心化日益受到质疑的当下，基于简单文档接口的分散式数据网络具有显著优势——它使数据发布者能够自主控制数据及其访问权限。虽然ltqp支持对此类网络进行复杂查询评估，但存在因数据文档数量庞大导致的性能问题，以及因多源文档供给引发的信息质量问题。现有ltqp方法中，寻找查询源的负担完全落在数据消费者方。本文提出应赋予数据发布者推荐关注源、引导数据消费者获取相关可信数据的能力以解决上述问题。我们建立了实现这种引导式链接遍历的理论框架并研究其特性，通过理论示例证明该方法能优化查询结果并减少网络请求次数。在虚拟链接网络上的实验评估表明，该方法不仅提升了数据质量，还显著提高了查询效率。本文正在考虑发表于《逻辑编程理论与实践》（TPLP）。

相关内容

TPLP

关注 0

《逻辑程序设计理论与实践》是一本国际性的期刊，它发表的论著涵盖了逻辑程序设计的理论与实践。逻辑适用于人工智能和计算机科学的所有领域。逻辑编程是这些领域的基础。其中包括使用逻辑编程的人工智能应用程序、逻辑编程方法、系统规范、分析和验证、归纳逻辑编程、多关系数据挖掘、自然语言处理、知识表示、非单调推理、语义web推理、数据库，实现和架构以及约束逻辑编程。官网链接：https://www.cambridge.org/core/journals/theory-and-practice-of-logic-programming

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日