Functional dependencies (FDs) are fundamental integrity constraints in relational databases, but discovering them under incremental updates remains challenging. While static algorithms are inefficient due to full re-execution, incremental algorithms suffer from severe performance and memory bottlenecks. To address these challenges, this paper proposes EAIFD, a novel algorithm for incremental FD discovery. EAIFD maintains the partial hypergraph of difference sets and reframes the incremental FD discovery problem into minimal hitting set enumeration on hypergraph, avoiding full re-runs. EAIFD introduces two key innovations. First, a multi-attribute hash table ($MHT$) is devised for high-frequency key-value mappings of valid FDs, whose memory consumption is proven to be independent of the dataset size. Second, two-step validation strategy is developed to efficiently validate the enumerated candidates, which leverages $MHT$ to effectively reduce the validation space and then selectively loads data blocks for batch validation of remaining candidates, effectively avoiding repeated I/O operations. Experimental results on real-world datasets demonstrate the significant advantages of EAIFD. Compared to existing algorithms, EAIFD achieves up to an order-of-magnitude speedup in runtime while reducing memory usage by over two orders-of-magnitude, establishing it as a highly efficient and scalable solution for incremental FD discovery.
翻译:函数依赖是关系数据库中的基本完整性约束,但在增量更新场景下发现函数依赖仍具挑战性。静态算法因需完全重新执行而效率低下,而增量算法则面临严重的性能和内存瓶颈。为应对这些挑战,本文提出EAIFD,一种新颖的增量函数依赖发现算法。EAIFD通过维护差异集的部分超图,将增量函数依赖发现问题重构为超图上的最小命中集枚举问题,从而避免完全重新执行。EAIFD引入两项关键创新:首先,设计了一种用于有效函数依赖高频键值映射的多属性哈希表($MHT$),其内存消耗被证明与数据集规模无关;其次,开发了一种两步验证策略以高效验证枚举候选,该策略利用$MHT$有效缩减验证空间,随后选择性加载数据块对剩余候选进行批量验证,从而有效避免重复I/O操作。在真实数据集上的实验结果表明,EAIFD具有显著优势:相较于现有算法,EAIFD在运行时间上实现高达一个数量级的加速,同时将内存使用降低超过两个数量级,确立了其作为高效可扩展的增量函数依赖发现解决方案的地位。