A possible world of an incomplete database table is obtained by imputing values from the attributes (infinite) domain to the place of \texttt{NULL} s. A table satisfies a possible key or possible functional dependency constraint if there exists a possible world of the table that satisfies the given key or functional dependency constraint. A certain key or functional dependency is satisfied by a table if all of its possible worlds satisfy the constraint. Recently, an intermediate concept was introduced. A strongly possible key or functional dependency is satisfied by a table if there exists a strongly possible world that satisfies the key or functional dependency. A strongly possible world is obtained by imputing values from the active domain of the attributes, that is from the values appearing in the table. In the present paper, we study approximation measures of strongly possible keys and FDs. Measure $g_3$ is the ratio of the minimum number of tuples to be removed in order that the remaining table satisfies the constraint. We introduce a new measure $g_5$, the ratio of the minimum number of tuples to be added to the table so the result satisfies the constraint. $g_5$ is meaningful because the addition of tuples may extend the active domains. We prove that if $g_5$ can be defined for a table and a constraint, then the $g_3$ value is always an upper bound of the $g_5$ value. However, the two measures are independent of each other in the sense that for any rational number $0\le\frac{p}{q}<1$ there are tables of an arbitrarily large number of rows and a constant number of columns that satisfy $g_3-g_5=\frac{p}{q}$. A possible world is obtained usually by adding many new values not occurring in the table before. The measure $g_5$ measures the smallest possible distortion of the active domains. We study complexity of determining these approximate measures.
翻译:不完全数据库表的一个可能世界是通过从属性的(无限)域中为\texttt{NULL}占位符填充值而获得的。若存在该表的一个可能世界满足给定的键或函数依赖约束,则称该表满足可能键或可能函数依赖约束;若表的所有可能世界均满足该约束,则称该表满足确定键或确定函数依赖。最近引入了一个中间概念:若存在一个强可能世界满足键或函数依赖,则称该表满足强可能键或强可能函数依赖,其中强可能世界是通过从属性的活跃域(即表中出现的值)填充值而获得的。本文研究了强可能键与函数依赖的近似度量。度量$g_3$为使剩余表满足约束所需删除的最小元组数比例。我们引入了一个新度量$g_5$,即为了使结果满足约束所需向表中添加的最小元组数比例。$g_5$具有实际意义,因为元组的添加可能扩展活跃域。我们证明了:若$g_5$可针对某表与约束定义,则$g_3$值始终是$g_5$值的上界。然而,这两个度量相互独立——对于任意有理数$0\le\frac{p}{q}<1$,存在行数任意大、列数恒定的表满足$g_3-g_5=\frac{p}{q}$。通常通过添加大量表中未出现的新值来获得可能世界,而度量$g_5$衡量的是活跃域的最小可能畸变程度。我们研究了判定这些近似度量的计算复杂性。