ZOR filters: fast and smaller than fuse filters

Probabilistic membership filters support fast approximate membership queries with a controlled false-positive probability $\varepsilon$ and are widely used across storage, analytics, networking, and bioinformatics \cite{chang2008bigtable,dayan2018optimalbloom,broder2004network,harris2020improved,marchet2023scalable,chikhi2025logan,hernandez2025reindeer2}. In the static setting, state-of-the-art designs such as XOR and fuse filters achieve low overhead and very fast queries, but their peeling-based construction succeeds only with high probability, which complicates deterministic builds \cite{graf2020xor,graf2022binary,ulrich2023taxor}. We introduce \emph{ZOR filters}, a deterministic continuation of XOR/fuse filters that guarantees construction termination while preserving the same XOR-based query mechanism. ZOR replaces restart-on-failure with deterministic peeling that abandons a small fraction of keys, and restores false-positive-only semantics by storing the remainder in a compact auxiliary structure. In our experiments, the abandoned fraction drops below $1\%$ for moderate arity (e.g., $N\ge 5$), so the auxiliary handles a negligible fraction of keys. As a result, ZOR filters can achieve overhead within $1\%$ of the information-theoretic lower bound $\log_2(1/\varepsilon)$ while retaining fuse-like query performance; the additional cost is concentrated on negative queries due to the auxiliary check. Our current prototype builds several-fold slower than highly optimized fuse builders because it maintains explicit incidence information during deterministic peeling; closing this optimisation gap is an engineering target.

翻译：概率成员过滤器支持快速的近似成员查询，并具有可控的误报概率$\varepsilon$，广泛应用于存储、分析、网络和生物信息学领域\cite{chang2008bigtable,dayan2018optimalbloom,broder2004network,harris2020improved,marchet2023scalable,chikhi2025logan,hernandez2025reindeer2}。在静态设置中，最先进的设计如XOR过滤器和fuse过滤器实现了低开销和极快的查询，但其基于剥离的构造仅以高概率成功，这使确定性构建变得复杂\cite{graf2020xor,graf2022binary,ulrich2023taxor}。我们引入了\emph{ZOR过滤器}，作为XOR/fuse过滤器的确定性延续，它保证构造终止，同时保留相同的基于XOR的查询机制。ZOR用确定性剥离替代了失败重启机制，该剥离会放弃一小部分密钥，并通过将剩余部分存储在紧凑的辅助结构中恢复仅误报的语义。在我们的实验中，对于适中的元数（例如$N\ge 5$），放弃比例降至$1\%$以下，因此辅助结构处理的密钥比例可忽略不计。因此，ZOR过滤器可以实现接近信息论下界$\log_2(1/\varepsilon)$的$1\%$以内开销，同时保持类似fuse的查询性能；由于辅助检查，额外成本集中在否定查询上。我们当前的构建原型比高度优化的fuse构建器慢数倍，因为它在确定性剥离过程中维护了显式的关联信息；缩小这一优化差距是工程目标。