Local Differential Privacy (LDP) is now widely adopted in large-scale systems to collect and analyze sensitive data while preserving users' privacy. However, almost all LDP protocols rely on a semi-trust model where users are curious-but-honest, which rarely holds in real-world scenarios. Recent works show poor estimation accuracy of many LDP protocols under malicious threat models. Although a few works have proposed some countermeasures to address these attacks, they all require prior knowledge of either the attacking pattern or the poison value distribution, which is impractical as they can be easily evaded by the attackers. In this paper, we adopt a general opportunistic-and-colluding threat model and propose a multi-group Differential Aggregation Protocol (DAP) to improve the accuracy of mean estimation under LDP. Different from all existing works that detect poison values on individual basis, DAP mitigates the overall impact of poison values on the estimated mean. It relies on a new probing mechanism EMF (i.e., Expectation-Maximization Filter) to estimate features of the attackers. In addition to EMF, DAP also consists of two EMF post-processing procedures (EMF* and CEMF*), and a group-wise mean aggregation scheme to optimize the final estimated mean to achieve the smallest variance. Extensive experimental results on both synthetic and real-world datasets demonstrate the superior performance of DAP over state-of-the-art solutions.
翻译:本地差分隐私(LDP)现已广泛应用于大规模系统,用于在保护用户隐私的同时收集和分析敏感数据。然而,几乎所有LDP协议都依赖于一种半信任模型——用户是“好奇但诚实”的,这在真实场景中几乎无法成立。近期研究表明,在恶意威胁模型下,许多LDP协议的估计精度严重下降。尽管少量研究提出了应对这些攻击的防护措施,但这些方法均需预先知道攻击模式或投毒值分布,而攻击者极易规避此类先验知识,故实用性不足。本文采用一种通用的机会性共谋威胁模型,提出多组差分聚合协议(DAP),以提升LDP下均值估计的精度。与现有所有基于个体检测投毒值的研究不同,DAP通过降低投毒值对估计均值的整体影响来发挥作用。其核心在于一种新型探测机制EMF(即期望最大化滤波器),用于估计攻击者的特征。除EMF外,DAP还包含两种EMF后处理流程(EMF*和CEMF*)以及分组均值聚合方案,以优化最终估计均值并实现最小方差。在合成数据集和真实数据集上的大量实验结果表明,DAP的性能显著优于当前最先进解决方案。