Finding Smallest Witnesses for Conjunctive Queries

A witness is a sub-database that preserves the query results of the original database but of much smaller size. It has wide applications in query rewriting and debugging, query explanation, IoT analytics, multi-layer network routing, etc. In this paper, we study the smallest witness problem (SWP) for the class of conjunctive queries (CQs) without self-joins. We first establish the dichotomy that SWP for a CQ can be computed in polynomial time if and only if it has {\em head-cluster property}, unless $\texttt{P} = \texttt{NP}$. We next turn to the approximated version by relaxing the size of a witness from being minimum. We surprisingly find that the {\em head-domination} property - that has been identified for the deletion propagation problem \cite{kimelfeld2012maximizing} - can also precisely capture the hardness of the approximated smallest witness problem. In polynomial time, SWP for any CQ with head-domination property can be approximated within a constant factor, while SWP for any CQ without such a property cannot be approximated within a logarithmic factor, unless $\texttt{P} = \texttt{NP}$. We further explore efficient approximation algorithms for CQs without head-domination property: (1) we show a trivial algorithm which achieves a polynomially large approximation ratio for general CQs; (2) for any CQ with only one non-output attribute, such as star CQs, we show a greedy algorithm with a logarithmic approximation ratio; (3) for line CQs, which contain at least two non-output attributes, we relate SWP problem to the directed steiner forest problem, whose algorithms can be applied to line CQs directly. Meanwhile, we establish a much higher lower bound, exponentially larger than the logarithmic lower bound obtained above. It remains open to close the gap between the lower and upper bound of the approximated SWP for CQs without head-domination property.

翻译：见证是一种保留原始数据库查询结果但规模小得多的子数据库，在查询重写与调试、查询解释、物联网分析、多层网络路由等领域具有广泛应用。本文研究无自连结合取查询类的最小见证问题（SWP）。首先建立了二分性结论：当且仅当合取查询具有“头簇性质”时，其最小见证问题可在多项式时间内求解，除非 $\texttt{P} = \texttt{NP}$。随后转向近似版本，放宽见证规模的最小性要求。我们惊奇地发现，此前在删除传播问题中辨识的“头支配性质”\cite{kimelfeld2012maximizing} 同样能精确刻画近似最小见证问题的难度。在多项式时间内，任何具有头支配性质的合取查询的最小见证问题可在常数因子内近似，而缺乏该性质的合取查询则无法在对数因子内近似，除非 $\texttt{P} = \texttt{NP}$。进一步探索无头支配性质合取查询的高效近似算法：（1）针对一般合取查询，给出可实现多项式级近似比的平凡算法；（2）对于仅含单个非输出属性的合取查询（如星型合取查询），提出具有对数近似比的贪心算法；（3）对于包含至少两个非输出属性的线型合取查询，将最小见证问题关联至有向斯坦纳森林问题，其算法可直接应用于此类查询。同时建立了远高于上述对数下界的更强下界。如何弥合无头支配性质合取查询近似最小见证问题的下界与上界差距仍为开放问题。