Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality

Query answering over data with dependencies plays a central role in most applications of dependencies. The problem is commonly solved by using a suitable variant of the chase algorithm to compute a universal model of the dependencies and the data and thus explicate all knowledge implicit in the dependencies. After this preprocessing step, an arbitrary conjunctive query over the dependencies and the data can be answered by evaluating it the computed universal model. If, however, the query to be answered is fixed and known in advance, computing the universal model is often inefficient as many inferences made during this process can be irrelevant to a given query. In such cases, a goal-driven approach, which avoids drawing unnecessary inferences, promises to be more efficient and thus preferable in practice. In this paper we present what we believe to be the first technique for goal-driven query answering over first- and second-order dependencies with equality reasoning. Our technique transforms the input dependencies so that applying the chase to the output avoids many inferences that are irrelevant to the query. The transformation proceeds in several steps, which comprise the following three novel techniques. First, we present a variant of the singularisation technique by Marnette [60] that is applicable to second-order dependencies and that corrects an incompleteness of a related formulation by ten Cate et al. [74]. Second, we present a relevance analysis technique that can eliminate from the input dependencies that provably do not contribute to query answers. Third, we present a variant of the magic sets algorithm [19] that can handle second-order dependencies with equality reasoning. We also present the results of an extensive empirical evaluation, which show that goal-driven query answering can be orders of magnitude faster than computing the full universal model.

翻译：依赖约束下的查询应答在依赖应用场景中占据核心地位。该问题通常通过使用追及算法的适当变体来计算依赖与数据的通用模型，从而显式化依赖中隐含的所有知识。在此预处理步骤之后，任何关于依赖与数据的合取查询都可通过在计算出的通用模型中求值来获得答案。然而，若待应答查询固定且预先已知，计算通用模型往往效率低下，因为该过程中许多推导可能与给定查询无关。在此类情况下，避免进行不必要推导的目标驱动方法在实践中更具效率优势。本文提出了一种基于一阶和二阶等值依赖的目标驱动查询应答技术，据我们所知这是该领域的首次探索。我们的技术通过转换输入依赖，使得对输出应用追及算法时能避免大量与查询无关的推导。该转换过程包含多个步骤，涵盖以下三项创新技术：首先，我们提出了适用于二阶依赖的Marnette[60]单化技术变体，修正了ten Cate等人[74]相关表述中的不完备性；其次，我们提出相关性分析技术，可消除输入依赖中经证明不影响查询答案的部分；第三，我们提出了能处理二阶等值依赖的magic sets算法[19]变体。我们还提供了详尽的实证评估结果，表明目标驱动查询应答相比计算完整通用模型可实现数量级的速度提升。