Statistical tools which satisfy rigorous privacy guarantees are necessary for modern data analysis. It is well-known that robustness against contamination is linked to differential privacy. Despite this fact, using multivariate medians for differentially private and robust multivariate location estimation has not been systematically studied. We develop novel finite-sample performance guarantees for differentially private multivariate depth-based medians, which are essentially sharp. Our results cover commonly used depth functions, such as the halfspace (or Tukey) depth, spatial depth, and the integrated dual depth. We show that under Cauchy marginals, the cost of heavy-tailed location estimation outweighs the cost of privacy. We demonstrate our results numerically using a Gaussian contamination model in dimensions up to d = 100, and compare them to a state-of-the-art private mean estimation algorithm. As a by-product of our investigation, we prove concentration inequalities for the output of the exponential mechanism about the maximizer of the population objective function. This bound applies to objective functions that satisfy a mild regularity condition.
翻译:满足严格隐私保障的统计工具是现代数据分析所必需的。众所周知,抗污染鲁棒性与差分隐私之间存在关联。然而,利用多变量中位数进行差分隐私鲁棒多变量位置估计尚未得到系统研究。我们针对基于差分隐私多变量深度中位数建立新颖的有限样本性能保证,这些保证本质上是紧的。我们的结果涵盖了常用深度函数,包括半空间(或Tukey)深度、空间深度和积分对偶深度。我们证明,在柯西边缘分布下,重尾位置估计的代价超过了隐私保护的代价。我们通过维度高达d=100的高斯污染模型数值验证了结果,并将其与当前最优的私有均值估计算法进行比较。作为研究的副产品,我们证明了关于指数机制输出在总体目标函数最大化点附近的浓度不等式。该界适用于满足温和正则条件的目标函数。