Local Differential Privacy (LDP) is now widely adopted in large-scale systems to collect and analyze sensitive data while preserving users' privacy. However, almost all LDP protocols rely on a semi-trust model where users are curious-but-honest, which rarely holds in real-world scenarios. Recent works show poor estimation accuracy of many LDP protocols under malicious threat models. Although a few works have proposed some countermeasures to address these attacks, they all require prior knowledge of either the attacking pattern or the poison value distribution, which is impractical as they can be easily evaded by the attackers. In this paper, we adopt a general opportunistic-and-colluding threat model and propose a multi-group Differential Aggregation Protocol (DAP) to improve the accuracy of mean estimation under LDP. Different from all existing works that detect poison values on individual basis, DAP mitigates the overall impact of poison values on the estimated mean. It relies on a new probing mechanism EMF (i.e., Expectation-Maximization Filter) to estimate features of the attackers. In addition to EMF, DAP also consists of two EMF post-processing procedures (EMF* and CEMF*), and a group-wise mean aggregation scheme to optimize the final estimated mean to achieve the smallest variance. Extensive experimental results on both synthetic and real-world datasets demonstrate the superior performance of DAP over state-of-the-art solutions.
翻译:局部差分隐私(LDP)已被广泛应用于大规模系统中,用于在保护用户隐私的同时收集和分析敏感数据。然而,几乎所有LDP协议都依赖于一种半信任模型,即用户虽好奇但诚实,这一假设在现实场景中往往难以成立。近期研究表明,许多LDP协议在恶意威胁模型下会出现较差的估计精度。尽管少数工作提出了应对这些攻击的改进措施,但它们均需要预先掌握攻击模式或毒化值分布的先验知识,而攻击者可轻易规避这些知识,因此实用性不足。本文采用一种通用的机会性合谋威胁模型,提出了一种多组差分聚合协议(DAP),以提升LDP下均值估计的准确性。与现有所有基于个体检测毒化值的方法不同,DAP通过削弱毒化值对整体估计均值的影响来发挥作用。它依赖于一种新的探测机制EMF(即期望最大化过滤器)来估计攻击者的特征。除EMF外,DAP还包含两个EMF后处理流程(EMF*和CEMF*)以及一种分组均值聚合方案,用于优化最终估计均值以实现最小方差。在合成数据集和真实数据集上的大量实验结果表明,DAP的性能显著优于现有最优解决方案。