We present an index structure, called the color-index, to boost the evaluation of acyclic conjunctive queries (ACQs) over binary schemas. The color-index is based on the color refinement algorithm, a widely used subroutine for graph isomorphism testing algorithms. Given a database $D$, we use a suitable version of the color refinement algorithm to produce a stable coloring of $D$, an assignment from the active domain of $D$ to a set of colors $C_D$. The main ingredient of the color-index is a particular database $D_c$ whose active domain is $C_D$ and whose size is at most $|D|$. Using the color-index, we can evaluate any free-connex ACQ $Q$ over $D$ with preprocessing time $O(|Q| \cdot |D_c|)$ and constant delay enumeration. Furthermore, we can also count the number of results of $Q$ over $D$ in time $O(|Q| \cdot |D_c|)$. Given that $|D_c|$ could be much smaller than $|D|$ (even constant-size for some families of databases), the color-index is the first index structure for evaluating free-connex ACQs that allows efficient enumeration and counting with performance that may be strictly smaller than the database size.
翻译:我们提出一种名为颜色索引的索引结构,用于加速二元模式上无环合取查询(ACQs)的评估。颜色索引基于颜色细化算法,该算法在图同构测试中是一种广泛使用的子程序。给定数据库$D$,我们采用颜色细化算法的适当版本生成$D$的稳定着色,即从$D$的活动域到颜色集合$C_D$的映射。颜色索引的核心是一个特定数据库$D_c$,其活动域为$C_D$,且大小不超过$|D|$。利用该索引,我们可在预处理时间$O(|Q| \cdot |D_c|)$内评估任意自由连通ACQ $Q$在$D$上的查询,并实现常数延迟枚举。此外,我们还能在$O(|Q| \cdot |D_c|)$时间内统计$Q$在$D$上的结果数量。由于$|D_c|$可能远小于$|D|$(甚至对某些数据库族保持常数规模),颜色索引成为首个支持自由连通ACQs高效枚举与计数、且性能可能严格小于数据库规模的索引结构。