We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e., we want to find a formula that \emph{minimizes the number of variable repetitions}. This problem is equivalent to solving the fundamental Boolean formula factorization problem for the restricted setting of the provenance formulas of self-join free queries. While general Boolean formula minimization is $\Sigma^p_2$-complete, we show that the problem is NP-C in our case. Additionally, we identify a large category of queries that can be solved in PTIME, expanding beyond the previously known tractable cases of read-once formulas and hierarchical queries. We describe connections between factorizations, variable elimination orders and minimal query plans. We leverage these insights to create an Integer Linear Program (ILP) that can solve the minimal factorization problem exactly. We also propose a Max-Flow Min-Cut (MFMC) based algorithm that gives an efficient approximate solution. Importantly, we show that both the Linear Programming (LP) relaxation of our proposed ILP, and our MFMC based algorithm are \emph{always correct for all known PTIME cases}. Thus, we present two unified algorithms that can recover all known PTIME cases in PTIME, yet also solve NP-C cases exactly or approximately as desired.
翻译:我们考虑寻找自连接自由合取查询溯源的最小规模因式分解问题,即希望找到一个公式,该公式能最小化变量重复次数。这一问题等价于在自连接自由查询的溯源公式这一受限场景下,求解基本的布尔公式因式分解问题。尽管一般布尔公式最小化是Σ^p_2完全的,但我们证明在该情形下问题是NP完全的。此外,我们识别出一大类可在多项式时间内求解的查询,这扩展了之前已知的可处理情形(如只读公式和分层查询)。我们描述了因式分解、变量消元顺序与最小查询计划之间的联系,并利用这些见解构建了一个整数线性规划模型,可精确求解最小因式分解问题。我们还提出一种基于最大流最小割的算法,能够给出高效的近似解。重要的是,我们证明所提出的整数线性规划的线性规划松弛以及基于最大流最小割的算法,在所有已知的多项式时间情形下始终正确。因此,我们提出了两个统一算法,既能以多项式时间恢复所有已知的多项式时间情形,又能按需精确或近似求解NP完全情形。