In differential privacy, random noise is introduced to privatize summary statistics of a sensitive dataset before releasing them. The noise level determines the privacy loss, which quantifies how easily an adversary can detect a target individual's presence in the dataset using the published statistic. Most privacy analyses provide upper bounds on the privacy loss. Sometimes, these bounds offer weak privacy guarantees unless the noise level is so high that it overwhelms the meaningful signal. It is unclear whether such high noise levels are necessary or a limitation of loose and pessimistic privacy bounds. This paper explores whether it is possible to obtain sharp privacy characterizations that determine the exact privacy loss of a mechanism on a given dataset. We study this problem in the context of differentially private principal component analysis (PCA), where the goal is to privatize the leading principal components of a dataset with $n$ samples and $p$ features. We analyze the exponential mechanism in a model-free setting and provide sharp utility and privacy characterizations in the high-dimensional limit ($p \rightarrow \infty$). We show that in high dimensions, detecting a target individual's presence using privatized PCs is exactly as hard as distinguishing between two Gaussians with slightly different means, where the mean difference depends on certain spectral properties of the dataset. Our analysis combines the hypothesis-testing formulation of privacy guarantees proposed by Dong, Roth, and Su (2022) with Le Cam's contiguity arguments.
翻译:暂无翻译