We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation, suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms assume a trusted central curator and do not extend to federated settings. Naively combining the secure and DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves four orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by taking advantage of constrained clustering techniques.
翻译:我们研究了水平联邦环境下隐私保护$k$-均值聚类的问题。现有的采用安全计算的联邦方法存在显著的开销问题,且无法提供输出隐私。同时,差分隐私(DP)$k$-均值算法假设存在一个可信的中心管理者,无法扩展至联邦场景。简单地将安全计算与DP方案结合会导致协议产生不切实际的开销。相反,我们的工作对DP和安全计算组件均进行了改进,从而设计出比以往工作更快、更私密且更精确的方案。通过利用计算型DP模型,我们提出了一种轻量级的、基于安全聚合的方法,相比最前沿相关工作实现了四个数量级的加速。此外,我们不仅保持了中央DP模型中最新技术的效用,还通过利用约束聚类技术进一步提升了效用。