Query evaluation over probabilistic databases is a notoriously intractable problem -- not only in combined complexity, but for many natural queries in data complexity as well. This motivates the study of probabilistic query evaluation through the lens of approximation algorithms, and particularly of combined FPRASes, whose runtime is polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, which can be equivalently viewed as probabilistic graphs. We study in which cases we can devise combined FPRASes for probabilistic query evaluation in this setting. We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability and (conditional) inapproximability results. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of Arenas et al. on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs: this was an open problem until now. We also show that one cannot extend the recent result of van Bremen and Meel that gives a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. Finally, we complement all our inapproximability results with unconditional lower bounds, showing that DNNF provenance circuits must have at least moderately exponential size in combined complexity.
翻译:在概率数据库上进行查询评估是一个公认的棘手问题——不仅体现在组合复杂度上,而且对于许多自然查询,即使在数据复杂度上也是如此。这促使我们通过近似算法的视角来研究概率查询评估,特别是组合FPRAS,其运行时间在查询规模与实例规模上均为多项式。本文聚焦于二元签名上的元组独立概率数据库,这等价于概率图。我们研究了在此背景下何时能够为概率查询评估设计组合FPRAS。通过证明可近似性结果与(条件性)不可近似性结果,我们解决了多种查询类和实例类下该问题的复杂度。由此推导出许多可能具有独立意义的推论。例如,我们展示了Arenas等人关于计算NFA接受定长字符串的结果如何蕴含有向无环图上两端网络可靠性问题的FPRAS存在性:这此前一直是一个开放问题。我们还证明,不能将van Bremen和Meel最近给出的关于有界超树宽无自连接合取查询在概率数据库上的组合FPRAS结果进行推广:无论是"有界超树宽"条件还是"无自连接"假设均不可放宽。最后,我们用无条件下界补全了所有不可近似性结果,表明在组合复杂度下DNNF溯源电路必须至少具有适度指数级规模。