We present an index structure to boost the evaluation of free-connex acyclic conjunctive queries (fc-ACQs) over relational databases. The main ingredient of the index associated with a given database $D$ is an auxiliary database $D_{col}$. Our main result states that for any fc-ACQ $Q$ over $D$, we can count the number of answers of $Q$ or enumerate them with constant delay after a preprocessing phase that takes time linear in the size of $D_{col}$. Unlike previous indexing methods based on values or order (e.g., B+ trees), our index is based on structural symmetries among tuples in a database, and the size of $D_{col}$ is related to the number of colors assigned to $D$ by Scheidt and Schweikardt's "relational color refinement" (2025). In the particular case of graphs, this coincides with the minimal size of an equitable partition of the graph. For example, the size of $D_{col}$ is logarithmic in the case of binary trees and constant for regular graphs. Even in the worst-case that $D$ has no structural symmetries among tuples at all, the size of $D_{col}$ is still linear in the size of $D$. Given that the size of $D_{col}$ is bounded by the size of $D$ and can be much smaller (even constant for some families of databases), our index is the first foundational result on indexing internal structural symmetries of a database to evaluate all fc-ACQs with performance potentially strictly smaller than the database size.
翻译:我们提出了一种索引结构,用于加速关系数据库上自由连接无环合取查询(fc-ACQ)的评估。与给定数据库 $D$ 关联的索引主要组成部分是一个辅助数据库 $D_{col}$。我们的主要结果表明,对于 $D$ 上的任意 fc-ACQ $Q$,我们可以在一个预处理阶段后,以常数延迟对 $Q$ 的答案进行计数或枚举,该预处理阶段的时间复杂度与 $D_{col}$ 的大小呈线性关系。与以往基于值或顺序的索引方法(例如 B+ 树)不同,我们的索引基于数据库中元组间的结构对称性,且 $D_{col}$ 的大小与 Scheidt 和 Schweikardt 的“关系颜色细化”(2025)方法分配给 $D$ 的颜色数量相关。在图这一特定情况下,这等同于图的公平划分的最小尺寸。例如,对于二叉树,$D_{col}$ 的大小是对数级的;对于正则图,则是常数。即使在最坏情况下,即 $D$ 中的元组之间完全没有结构对称性,$D_{col}$ 的大小仍然与 $D$ 的大小呈线性关系。鉴于 $D_{col}$ 的大小受限于 $D$ 的大小,并且可能小得多(对于某些数据库族甚至是常数),我们的索引是首个基于数据库内部结构对称性进行索引的基础性成果,旨在以可能严格小于数据库大小的性能评估所有 fc-ACQ。