In this paper, we study differentially private (DP) algorithms for computing the geometric median (GM) of a dataset: Given $n$ points, $x_1,\dots,x_n$ in $\mathbb{R}^d$, the goal is to find a point $\theta$ that minimizes the sum of the Euclidean distances to these points, i.e., $\sum_{i=1}^{n} \|\theta - x_i\|_2$. Off-the-shelf methods, such as DP-GD, require strong a priori knowledge locating the data within a ball of radius $R$, and the excess risk of the algorithm depends linearly on $R$. In this paper, we ask: can we design an efficient and private algorithm with an excess error guarantee that scales with the (unknown) radius containing the majority of the datapoints? Our main contribution is a pair of polynomial-time DP algorithms for the task of private GM with an excess error guarantee that scales with the effective diameter of the datapoints. Additionally, we propose an inefficient algorithm based on the inverse smooth sensitivity mechanism, which satisfies the more restrictive notion of pure DP. We complement our results with a lower bound and demonstrate the optimality of our polynomial-time algorithms in terms of sample complexity.
翻译:本文研究了用于计算数据集几何中位数(GM)的差分隐私(DP)算法:给定 $\mathbb{R}^d$ 空间中的 $n$ 个点 $x_1,\dots,x_n$,目标是找到一个点 $\theta$,使其到这些点的欧几里得距离之和最小,即 $\sum_{i=1}^{n} \|\theta - x_i\|_2$。现有现成方法(例如 DP-GD)需要强先验知识将数据定位在半径 $R$ 的球体内,且算法的超额风险线性依赖于 $R$。本文探讨:能否设计一种高效且隐私的算法,其超额误差保证能够与包含大多数数据点的(未知)半径成比例?我们的主要贡献是提出了一对用于私有 GM 计算的多项式时间 DP 算法,其超额误差保证与数据点的有效直径成比例。此外,我们提出了一种基于逆平滑敏感度机制的无效算法,该算法满足更严格的纯 DP 定义。我们通过下界结果补充了上述结论,并在样本复杂度方面证明了所提多项式时间算法的最优性。