Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data.
翻译:数据草图是不同计数中的关键工具,能将多重集压缩为紧凑摘要,实现快速基数估计。由于草图可合并以汇总多重集并集,因此成为数据仓库的基础构件。尽管存在大量用于基数估计的实用草图,但没有任何一种能在合并时提供隐私保护。我们首次提出兼具可合并性、差分隐私(DP)和低经验误差的实用基数草图。这些草图引入了一种对噪声比特执行逻辑运算的新型随机算法、严格隐私分析以及理论上最优的估计方法。在仿真实验和真实数据上,我们的草图性能显著优于现有理论解决方案。