Local Differential Privacy (LDP) is now widely adopted in large-scale systems to collect and analyze sensitive data while preserving users' privacy. However, almost all LDP protocols rely on a semi-trust model where users are curious-but-honest, which rarely holds in real-world scenarios. Recent works show poor estimation accuracy of many LDP protocols under malicious threat models. Although a few works have proposed some countermeasures to address these attacks, they all require prior knowledge of either the attacking pattern or the poison value distribution, which is impractical as they can be easily evaded by the attackers. In this paper, we adopt a general opportunistic-and-colluding threat model and propose a multi-group Differential Aggregation Protocol (DAP) to improve the accuracy of mean estimation under LDP. Different from all existing works that detect poison values on individual basis, DAP mitigates the overall impact of poison values on the estimated mean. It relies on a new probing mechanism EMF (i.e., Expectation-Maximization Filter) to estimate features of the attackers. In addition to EMF, DAP also consists of two EMF post-processing procedures (EMF* and CEMF*), and a group-wise mean aggregation scheme to optimize the final estimated mean to achieve the smallest variance. Extensive experimental results on both synthetic and real-world datasets demonstrate the superior performance of DAP over state-of-the-art solutions.
翻译:局部差分隐私(LDP)现已广泛应用于大规模系统中,用于在保护用户隐私的同时收集和分析敏感数据。然而,几乎所有LDP协议都依赖于用户“好奇但诚实”的半信任模型,这在现实场景中鲜少成立。近期研究表明,在恶意攻击模型下,许多LDP协议的估计精度较低。尽管已有少数工作提出了应对这些攻击的措施,但它们均需事先掌握攻击模式或毒化值分布的先验知识,这在实际中难以实现,因为攻击者可轻易规避这些假设。本文采用一种通用的机会性合谋攻击模型,提出一种多组差分聚合协议(DAP),以提高LDP下均值估计的精度。与现有所有基于单个值检测毒化值的方案不同,DAP通过降低毒化值对估计均值的整体影响来缓解攻击。该协议依赖一种新的探测机制——期望最大化滤波器(EMF)来估计攻击者的特征。除EMF外,DAP还包含两种EMF后处理过程(EMF*和CEMF*),以及一种分组均值聚合方案,以优化最终估计均值并实现最小方差。在合成数据集与真实数据集上的大量实验结果表明,DAP的性能显著优于现有最优方案。