This paper establishes the strict optimality in precision for frequency estimation under local differential privacy (LDP). We prove that a frequency estimator with a symmetric and extremal configuration, and a constant support size equal to an optimized value, is sufficient to achieve maximum precision. Furthermore, we derive that the communication cost of such an optimal estimator can be as low as $\log_2(\frac{d(d-1)}{2}+1)$, where $d$ denotes the dictionary size, and propose an algorithm to generate this optimal estimator. In addition, we introduce a modified Count-Mean Sketch and demonstrate that it is practically indistinguishable from theoretical optimality with a sufficiently large dictionary size (e.g., $d=100$ for a privacy factor of $ε= 1$). We compare existing methods with our proposed optimal estimator to provide selection guidelines for practical deployment. Finally, the performance of these estimators is evaluated experimentally, showing that the empirical results are consistent with our theoretical derivations.
翻译:本文建立了局部差分隐私(LDP)下频率估计在精度上的严格最优性。我们证明,一个具有对称且极值配置、且支撑集大小为优化常数的频率估计器,足以达到最大精度。此外,我们推导出此类最优估计器的通信成本可低至 $\log_2(\frac{d(d-1)}{2}+1)$,其中 $d$ 表示字典大小,并提出一种生成该最优估计器的算法。此外,我们引入了一种改进的Count-Mean Sketch方法,并证明在字典大小足够大时(例如,隐私因子 $ε= 1$ 下 $d=100$),其实际表现与理论最优性几乎无法区分。我们将现有方法与所提出的最优估计器进行比较,为实际部署提供选择指导。最后,通过实验评估了这些估计器的性能,结果表明实证结果与我们的理论推导一致。