Bloom filter is a widely used classic data structure for approximate membership queries. Learned Bloom filters improve memory efficiency by leveraging machine learning, with the partitioned learned Bloom filter (PLBF) being among the most memory-efficient variants. However, PLBF suffers from high computational complexity during construction, specifically $O(N^3k)$, where $N$ and $k$ are hyperparameters. In this paper, we propose three methods: fast PLBF, fast PLBF++, and fast PLBF#, that reduce the construction complexity to $O(N^2k)$, $O(Nk \log N)$, and $O(Nk \log k)$, respectively. Fast PLBF preserves the original PLBF structure and memory efficiency. Although fast PLBF++ and fast PLBF# may have different structures, we theoretically prove they are equivalent to PLBF under ideal data distribution. Furthermore, we theoretically bound the difference in memory efficiency between PLBF and fast PLBF++ for non-ideal scenarios. Experiments on real-world datasets demonstrate that fast PLBF, fast PLBF++, and fast PLBF# are up to 233, 761, and 778 times faster to construct than original PLBF, respectively. Additionally, fast PLBF maintains the same data structure as PLBF, and fast PLBF++ and fast PLBF# achieve nearly identical memory efficiency.
翻译:布隆过滤器是一种广泛使用的经典数据结构,用于近似成员查询。学习布隆过滤器通过利用机器学习来提高内存效率,其中划分式学习布隆过滤器(PLBF)是内存效率最高的变体之一。然而,PLBF在构建过程中存在较高的计算复杂度,具体为$O(N^3k)$,其中$N$和$k$是超参数。本文提出了三种方法:快速PLBF、快速PLBF++和快速PLBF#,它们分别将构建复杂度降低至$O(N^2k)$、$O(Nk \log N)$和$O(Nk \log k)$。快速PLBF保留了原始PLBF的结构和内存效率。尽管快速PLBF++和快速PLBF#可能具有不同的结构,但我们在理论上证明了在理想数据分布下它们与PLBF是等价的。此外,我们从理论上界定了在非理想场景下PLBF与快速PLBF++之间内存效率的差异。在真实数据集上的实验表明,快速PLBF、快速PLBF++和快速PLBF#的构建速度分别比原始PLBF快达233倍、761倍和778倍。此外,快速PLBF保持了与PLBF相同的数据结构,而快速PLBF++和快速PLBF#实现了几乎相同的内存效率。