As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied for the local and the shuffle variation. In each case, the goal is to design an algorithm that computes privately a clustering, with the smallest possible error. The study of each variation gave rise to new algorithms: the landscape of private clustering algorithms is therefore quite intricate. In this paper, we show that a 20-year-old algorithm can be slightly modified to work for any of these models. This provides a unified picture: while matching almost all previously known results, it allows us to improve some of them and extend it to a new privacy model, the continual observation setting, where the input is changing over time and the algorithm must output a new solution at each time step.
翻译:作为数据分析和无监督学习的核心问题,隐私聚类已在各种隐私模型下得到广泛研究。中心化差分隐私是其中的首个模型,该问题也已在本地和洗牌变体中得到研究。在每种情况下,目标都是设计一种算法,以尽可能小的误差私下计算出一个聚类。对每种变体的研究都催生了新的算法:因此,隐私聚类算法的格局相当复杂。在本文中,我们展示了一个已有20年历史的算法只需稍作修改,即可适用于所有这些模型。这提供了一个统一的图景:在匹配几乎所有先前已知结果的同时,它使我们能够改进其中一些结果,并将其扩展到一种新的隐私模型——持续观察设置,其中输入随时间变化,算法必须在每个时间步输出一个新的解决方案。