We present an approach to computing consistent answers to queries possibly involving an aggregation operator in databases operating under a star schema and possibly containing missing values and inconsistent data. Our approach is based on earlier work concerning consistent query answering for standard queries (with no aggregate operator) in multi-table databases. In that work, we presented polynomial algorithms for computing either the exact consistent answer to a query or bounds of the exact answer, depending on whether the query involves a selection condition or not. In the present work, we consider databases operating under a star schema. Calling data warehouses such databases, we extend our previous work to queries involving aggregate operators, called analytic queries. In this context, we propose specific algorithms for computing exact consistent answers to queries, whether analytic or not, provided that the selection condition in the query satisfies the property of independency (i.e., the condition can be expressed as a conjunction of conditions each involving a single attribute). We show that the overall time complexity of these specific algorithms is in O(W.log(W)), where W is the size of the data warehouse. Moreover, the case of analytic queries involving a having clause associated with a group-by clause is discussed in the context of our approach.
翻译:本文提出了一种在星型模式下运行的、可能包含缺失值和不一致数据的数据库中,计算可能涉及聚合运算符的查询一致性应答的方法。我们的方法基于早期关于多表数据库中标准查询(不含聚合运算符)一致性应答的研究。在该研究中,我们提出了多项式算法,根据查询是否涉及选择条件,计算查询的精确一致性应答或精确应答的边界。在当前工作中,我们考虑在星型模式下运行的数据库。将此类数据库称为数据仓库,我们将先前工作扩展到涉及聚合运算符的查询(称为分析查询)。在此背景下,我们提出了特定算法,用于计算查询(无论是否为分析查询)的精确一致性应答,前提是查询中的选择条件满足独立性属性(即该条件可表示为每个条件仅涉及单个属性的合取式)。我们证明这些特定算法的整体时间复杂度为 O(W.log(W)),其中 W 表示数据仓库的规模。此外,本文还在所提方法的框架下讨论了涉及与 group-by 子句关联的 having 子句的分析查询情形。