Filter data structures are widely used in various areas of computer science to answer approximate set-membership queries. In many applications, the data grows dynamically, requiring their filters to expand along with the data that they represent. However, existing methods for expanding filters cannot maintain stable performance, memory footprint, and false positive rate at the same time. We address this problem with Aleph Filter, which makes the following contributions. (1) It supports all operations (insertions, queries, deletes, etc.) in constant time, no matter how much the data grows. (2) Given any rough estimate of how much the data will ultimately grow, Aleph Filter provides far superior memory vs. false positive rate trade-offs, even if the estimate is off by orders of magnitude.
翻译:过滤器数据结构广泛应用于计算机科学的各个领域,用于回答近似集合成员查询。在许多应用中,数据会动态增长,要求过滤器能随其表示的数据一同扩展。然而,现有过滤器扩展方法无法同时维持稳定的性能、内存占用和假阳性率。我们通过Aleph过滤器解决了这一问题,其贡献如下:(1) 无论数据如何增长,它都能在常数时间内支持所有操作(插入、查询、删除等)。(2) 对于数据最终增长规模的任何粗略估计,即使该估计存在数量级的偏差,Aleph过滤器也能在内存与假阳性率的权衡中提供远优于传统方案的性能。