In differential privacy, random noise is introduced to privatize summary statistics of a sensitive dataset before releasing them. The noise level determines the privacy loss, which quantifies how easily an adversary can detect a target individual's presence in the dataset using the published statistic. Most privacy analyses provide upper bounds on the privacy loss. Sometimes, these bounds offer weak privacy guarantees unless the noise level is so high that it overwhelms the meaningful signal. It is unclear whether such high noise levels are necessary or a limitation of loose and pessimistic privacy bounds. This paper explores whether it is possible to obtain sharp privacy characterizations that determine the exact privacy loss of a mechanism on a given dataset. We study this problem in the context of differentially private principal component analysis (PCA), where the goal is to privatize the leading principal components of a dataset with $n$ samples and $p$ features. We analyze the exponential mechanism in a model-free setting and provide sharp utility and privacy characterizations in the high-dimensional limit ($p \rightarrow \infty$). We show that in high dimensions, detecting a target individual's presence using privatized PCs is exactly as hard as distinguishing between two Gaussians with slightly different means, where the mean difference depends on certain spectral properties of the dataset. Our analysis combines the hypothesis-testing formulation of privacy guarantees proposed by Dong, Roth, and Su (2022) with Le Cam's contiguity arguments.
翻译:在差分隐私中,随机噪声被引入以在发布敏感数据集的汇总统计量之前对其进行隐私化处理。噪声水平决定了隐私损失,该损失量化了攻击者利用已发布的统计量检测目标个体是否存在于数据集中的难易程度。大多数隐私分析提供了隐私损失的上界。有时,除非噪声水平高到足以淹没有意义的信号,否则这些界限提供的隐私保证较弱。目前尚不清楚如此高的噪声水平是否是必要的,还是宽松且悲观的隐私界限的局限性。本文探讨了是否可能获得尖锐的隐私刻画,以确定给定机制在特定数据集上的确切隐私损失。我们在差分隐私主成分分析(PCA)的背景下研究此问题,其目标是隐私化处理具有 $n$ 个样本和 $p$ 个特征的数据集的主成分。我们在无模型设定下分析指数机制,并在高维极限($p \rightarrow \infty$)下提供尖锐的效用和隐私刻画。我们证明,在高维情况下,利用隐私化主成分检测目标个体的存在性,其难度恰好等同于区分两个均值略有不同的高斯分布,其中均值差异取决于数据集的某些谱特性。我们的分析结合了 Dong、Roth 和 Su(2022)提出的隐私保证假设检验表述与 Le Cam 的邻接性论证。