Quantifying the contribution of database facts to query answers has been studied as means of explanation. The Banzhaf value, originally developed in Game Theory, is a natural measure of fact contribution, yet its efficient computation for select-project-join-union queries is challenging. In this paper, we introduce three algorithms to compute the Banzhaf value of database facts: an exact algorithm, an anytime deterministic approximation algorithm with relative error guarantees, and an algorithm for ranking and top-$k$. They have three key building blocks: compilation of query lineage into an equivalent function that allows efficient Banzhaf value computation; dynamic programming computation of the Banzhaf values of variables in a Boolean function using the Banzhaf values for constituent functions; and a mechanism to compute efficiently lower and upper bounds on Banzhaf values for any positive DNF function. We complement the algorithms with a dichotomy for the Banzhaf-based ranking problem: given two facts, deciding whether the Banzhaf value of one is greater than of the other is tractable for hierarchical queries and intractable for non-hierarchical queries. We show experimentally that our algorithms significantly outperform exact and approximate algorithms from prior work, most times up to two orders of magnitude. Our algorithms can also cover challenging problem instances that are beyond reach for prior work.
翻译:量化数据库事实对查询回答的贡献已被研究作为解释手段。Banzhaf值最初源于博弈论,是衡量事实贡献的自然指标,但针对选择-投影-连接-并查询的高效计算极具挑战。本文提出了三种计算数据库事实Banzhaf值的算法:精确算法、具有相对误差保证的任意时刻确定性近似算法,以及排序与top-$k$算法。它们包含三个关键构建模块:将查询谱系编译为允许高效Banzhaf值计算的等价函数;利用布尔函数中变量的Banzhaf值通过构成函数的Banzhaf值进行动态规划计算;以及针对任意正析取范式函数高效计算Banzhaf值上下界的机制。我们通过基于Banzhaf的排序问题的二分性对算法进行补充:给定两个事实,判断一个事实的Banzhaf值是否大于另一个,该问题对分层查询是可处理的,对非分层查询是不可处理的。实验表明,我们的算法显著优于先前工作中的精确与近似算法,多数情况下性能提升达两个数量级。我们的算法还能覆盖先前工作无法处理的具有挑战性的问题实例。