Information inequalities appear in many database applications such as query output size bounds, query containment, and implication between data dependencies. Recently Khamis et al. proposed to study the algorithmic aspects of information inequalities, including the information inequality problem: decide whether a linear inequality over entropies of random variables is valid. While the decidability of this problem is a major open question, applications often involve only inequalities that adhere to specific syntactic forms linked to useful semantic invariance properties. This paper studies the information inequality problem in different syntactic and semantic scenarios that arise from database applications. Focusing on the boundary between tractability and intractability, we show that the information inequality problem is coNP-complete if restricted to normal polymatroids, and in polynomial time if relaxed to monotone functions. We also examine syntactic restrictions related to query output size bounds, and provide an alternative proof, through monotone functions, for the polynomial-time computability of the entropic bound over simple sets of degree constraints.
翻译:信息不等式出现在许多数据库应用中,例如查询输出大小界限、查询包含性以及数据依赖之间的蕴含关系。近期,Khamis等人提出研究信息不等式的算法方面问题,包括信息不等式问题:判断随机变量熵之间的线性不等式是否成立。虽然该问题的可判定性是一个重大未解难题,但实际应用通常只涉及遵循特定语法形式的不等式,这些形式与有用的语义不变性质相关联。本文研究数据库应用中不同语法和语义场景下的信息不等式问题。聚焦于可处理性与难处理性之间的边界,我们表明:若限制于正规多拟阵,信息不等式问题是coNP完全的;若放宽至单调函数,则可在多项式时间内求解。我们还考察了与查询输出大小界限相关的语法限制,并通过单调函数为简单度约束集合上的熵界多项式时间可计算性提供了另一种证明。