This paper identifies that a group of state-of-the-art locally-differentially-private (LDP) algorithms for frequency estimation are equivalent to the private Count-Mean Sketch (CMS) algorithm with different parameters. Therefore, we revisit the private CMS, correct errors in the original CMS paper regarding expectation and variance, modify the CMS implementation to eliminate existing bias, and explore optimized parameters for CMS to achieve optimality in reducing the worst-case mean squared error (MSE), $l_1$ loss, and $l_2$ loss. Additionally, we prove that pairwise-independent hashing is sufficient for CMS, reducing its communication cost to the logarithm of the cardinality of all possible values (i.e., a dictionary). As a result, the aforementioned optimized CMS is proven theoretically and empirically to be the only algorithm optimized for reducing the worst-case MSE, $l_1$ loss, and $l_2$ loss when dealing with a very large dictionary. Furthermore, we demonstrate that randomness is necessary to ensure the correctness of CMS, and the communication cost of CMS, though low, is unavoidable despite the randomness being public or private.
翻译:本文发现,一组用于频率估计的先进局部差分隐私算法等价于具有不同参数的私有计数均值草图算法。因此,我们重新审视私有CMS算法,修正了原始CMS论文中关于期望和方差的错误,修改CMS实现以消除现有偏差,并探索CMS的优化参数以实现最坏情况均方误差、$l_1$损失和$l_2$损失的最优化。此外,我们证明两两独立哈希对CMS已足够,从而将其通信成本降低至所有可能取值基数(即字典大小)的对数级别。结果表明,经理论与实验证明,上述优化后的CMS是处理超大字典时唯一能同时优化最坏情况MSE、$l_1$损失和$l_2$损失的算法。进一步地,我们论证了随机性对保证CMS正确性的必要性,并指出无论随机性是公开还是私有的,CMS的通信成本虽低但不可避免。