In real life, data are often of poor quality as a result, for instance, of uncertainty, mismeasurements, missing values or bad inputs. This issue hampers an implicit yet crucial operation of every database management system: equality testing. Indeed, equality is, in the end, a context-dependent operation with a plethora of interpretations. In practice, the treatment of different types of equality is left to programmers, who have to struggle with those interpretations in their code. We propose a new lattice-based declarative framework to address this problem. It allows specification of numerous semantics for equality at a high level of abstraction. To go beyond tuple equality, we study functional dependencies (FDs) in the light of our framework. First, we define abstract FDs, generalizing classical FDs. These lead to the consideration of particular interpretations of equality: realities. Building upon realities and possible/certain answers, we introduce possible/certain FDs and give some related complexity results.
翻译:在现实生活中,数据往往因不确定性、测量误差、缺失值或错误输入等原因而质量不佳。这一问题妨碍了每个数据库管理系统中一项隐性但至关重要的操作:相等性测试。实际上,相等性最终是一种依赖于上下文的操作,具有多种解释。在实践中,不同类型相等性的处理被留给程序员,他们不得不在代码中应对这些解释。我们提出了一种新的基于格结构的声明式框架来解决这一问题。该框架允许在高层抽象上指定多种相等性语义。为了超越元组相等性,我们基于该框架研究了函数依赖(FDs)。首先,我们定义了抽象函数依赖,推广了经典函数依赖。这引出了对相等性特定解释(即现实性)的考量。基于现实性与可能/确定答案,我们引入了可能/确定函数依赖,并给出了相关复杂性结果。