Counting the number of distinct elements distributed over multiple data holders is a fundamental problem with many real-world applications ranging from crowd counting to network monitoring. Although a number of space and computational efficient sketch methods (e.g., the Flajolet-Martin sketch and the HyperLogLog sketch) for cardinality estimation have been proposed to solve the above problem, these sketch methods are insecure when considering privacy concerns related to the use of each data holder's personal dataset. Despite a recently proposed protocol that successfully implements the well-known Flajolet-Martin (FM) sketch on a secret-sharing based multiparty computation (MPC) framework for solving the problem of private distributed cardinality estimation (PDCE), we observe that this MPC-FM protocol is not differentially private. In addition, the MPC-FM protocol is computationally expensive, which limits its applications to data holders with limited computation resources. To address the above issues, in this paper we propose a novel protocol DP-DICE, which is computationally efficient and differentially private for solving the problem of PDCE. Experimental results show that our DP-DICE achieves orders of magnitude speedup and reduces the estimation error by several times in comparison with state-of-the-arts under the same security requirements.
翻译:统计分布在多个数据持有者中的不同元素数量是一个基础性问题,具有从人群计数到网络监控等众多实际应用。尽管已有多种空间和计算高效的基数估计草图方法(例如Flajolet-Martin草图和HyperLogLog草图)被提出以解决上述问题,但在考虑涉及每个数据持有者个人数据集使用的隐私问题时,这些草图方法并不安全。最近虽然有协议成功地将著名的Flajolet-Martin(FM)草图部署在基于秘密共享的多方计算(MPC)框架上,以解决私有分布式基数估计(PDCE)问题,但我们发现该MPC-FM协议并不满足差分隐私。此外,该MPC-FM协议计算开销大,限制了其在计算资源有限的数据持有者中的应用。针对上述问题,本文提出了一种新型协议DP-DICE,该协议计算高效且满足差分隐私,用于解决PDCE问题。实验结果表明,在相同安全要求下,与现有最优方法相比,我们的DP-DICE实现了数量级的加速,并将估计误差降低了数倍。