This paper establishes the strict optimality in precision for frequency and distribution estimation under local differential privacy (LDP). We prove that a linear estimator with a symmetric and extremal configuration, and a constant support size equal to an optimized value, is sufficient to achieve the theoretical lower bound of the $\mathcal{L}_2$ loss for both frequency and distribution estimation. The theoretical $\mathcal{L}_1$ lower bound is also achieved asymptotically. Furthermore, we derive that the communication cost of such an optimal estimator can be as low as $\log_2(\frac{d(d-1)}{2}+1)$ bits, where $d$ denotes the dictionary size, and propose an algorithm to generate this optimal estimator. In addition, we introduce a modified Count-Mean Sketch and demonstrate that it is practically indistinguishable from theoretical optimality with a sufficiently large dictionary size (e.g., $d=100$ for a privacy parameter of $ε= 1$). We compare existing methods with our proposed optimal estimator to provide selection guidelines for practical deployment. Finally, the performance of these estimators is evaluated experimentally, showing that the empirical results are consistent with our theoretical derivations.
翻译:暂无翻译