Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability

from arxiv, 20 pages. Up to minor changes (including the correction of a minor error in the proof of Theorem 6.3), this article is identical to the ICDT'24 publication

Query evaluation over probabilistic databases is a notoriously intractable problem -- not only in combined complexity, but for many natural queries in data complexity as well. This motivates the study of probabilistic query evaluation through the lens of approximation algorithms, and particularly of combined FPRASes, whose runtime is polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, which can be equivalently viewed as probabilistic graphs. We study in which cases we can devise combined FPRASes for probabilistic query evaluation in this setting. We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability and (conditional) inapproximability results. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of Arenas et al. on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs: this was an open problem until now. We also show that one cannot extend a recent result of van Bremen and Meel that gives a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. Finally, we complement all our inapproximability results with unconditional lower bounds, showing that DNNF provenance circuits must have at least moderately exponential size in combined complexity.

翻译：概率数据库上的查询评估是一个众所周知的棘手问题——不仅体现在组合复杂性上，而且在数据复杂性中，许多自然查询也同样困难。这促使我们从近似算法的角度研究概率查询评估，特别是组合FPRAS（完全多项式随机近似方案），其运行时间在查询和实例规模上都是多项式级的。本文聚焦于二元签名上的元组独立概率数据库，这可以等效地视为概率图。我们研究了在此设定下，针对哪些情况可以设计出用于概率查询评估的组合FPRAS。我们通过证明可近似性和（条件性）不可近似性结果，解决了多种查询和实例类别的复杂性。这使我们能够推导出许多可能具有独立兴趣的推论。例如，我们展示了Arenas等人关于计算NFA接受定长字符串的结果如何隐含了有向无环图上两端点网络可靠性问题的FPRAS存在性：这在此之前是一个开放问题。我们还证明了无法扩展van Bremen和Meel的最新结果，该结果为概率数据库上有界超树宽的无自连接合取查询提供了组合FPRAS：有界超树宽条件或自连接无关假设均不能放宽。最后，我们通过无条件下界对所有不可近似性结果进行补充，表明DNNF溯源电路在组合复杂性上必须具有至少中等指数规模。