In differential privacy, statistics of a sensitive dataset are privatized by introducing random noise. Most privacy analyses provide privacy bounds specifying a noise level sufficient to achieve a target privacy guarantee. Sometimes, these bounds are pessimistic and suggest adding excessive noise, which overwhelms the meaningful signal. It remains unclear if such high noise levels are truly necessary or a limitation of the proof techniques. This paper explores whether we can obtain sharp privacy characterizations that identify the smallest noise level required to achieve a target privacy level for a given mechanism. We study this problem in the context of differentially private principal component analysis, where the goal is to privatize the leading principal components (PCs) of a dataset with n samples and p features. We analyze the exponential mechanism for this problem in a model-free setting and provide sharp utility and privacy characterizations in the high-dimensional limit ($p\rightarrow\infty$). Our privacy result shows that, in high dimensions, detecting the presence of a target individual in the dataset using the privatized PCs is exactly as hard as distinguishing two Gaussians with slightly different means, where the mean difference depends on certain spectral properties of the dataset. Our privacy analysis combines the hypothesis-testing formulation of privacy guarantees proposed by Dong, Roth, and Su (2022) with classical contiguity arguments due to Le Cam to obtain sharp high-dimensional privacy characterizations.
翻译:在差分隐私中,敏感数据集的统计信息通过引入随机噪声进行隐私化处理。大多数隐私分析提供的隐私边界规定了实现目标隐私保证所需的足够噪声水平。有时,这些边界较为保守,建议添加过量噪声,从而湮没有意义的信号。目前尚不清楚如此高的噪声水平是否确有必要,抑或是证明技术的局限性。本文探讨是否能够获得精确的隐私刻画,以识别给定机制在实现目标隐私水平时所需的最小噪声量。我们在差分隐私主成分分析的背景下研究该问题,其目标是对包含n个样本和p个特征的数据集的前导主成分进行隐私化处理。我们在无模型设定下分析该问题的指数机制,并在高维极限($p\rightarrow\infty$)下给出精确的效用与隐私刻画。我们的隐私结果表明,在高维情形下,使用隐私化主成分检测数据集中特定个体是否存在,其难度恰好等同于区分两个均值存在微小差异的高斯分布,该均值差异取决于数据集的特定谱特性。我们的隐私分析结合了Dong、Roth和Su(2022)提出的隐私保证假设检验框架,以及Le Cam的经典邻接论证,从而获得了精确的高维隐私刻画。