In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketching is nearly information-theoretically optimal for achieving the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However, we show that under the more practical multi-round FEE setting, simple adaptations of count sketching are strictly sub-optimal, and we propose a novel hybrid sketching algorithm that is provably more accurate. We also address the following fundamental question: how should a practitioner set the sketch size in a way that adapts to the hardness of the underlying problem? We propose a two-phase approach that allows for the use of a smaller sketch size for simpler problems (e.g. near-sparse or light-tailed distributions). We conclude our work by showing how differential privacy can be added to our algorithm and verifying its superior performance through extensive experiments conducted on large-scale datasets.
翻译:在联邦频率估计(FFE)中,多个客户端协同工作,通过一个尊重安全求和(SecSum)隐私约束的服务器来估计其集体数据的频率。SecSum是一种密码学多方计算协议,确保服务器只能访问客户端持有的向量的总和。对于单轮FFE,已知计数草图在实现基本精度-通信权衡方面几乎达到了信息论意义上的最优性[Chen et al., 2022]。然而,我们证明,在更实际的多轮FFE设置下,简单的计数草图自适应方案严格次优,并提出了一种新颖的混合草图算法,该算法在理论上具有更高的精度。我们还解决了以下基本问题:从业者应如何设置草图大小,使其适应底层问题的难度?我们提出了一种两阶段方法,该方法允许对简单问题(例如,近似稀疏或轻尾分布)使用较小的草图大小。最后,我们展示了如何将差分隐私添加到我们的算法中,并通过在大规模数据集上进行的大量实验验证了其优越性能。