Estimating a covariance matrix and its associated principal components is a fundamental problem in contemporary statistics. While optimal estimation procedures have been developed with well-understood properties, the increasing demand for privacy preservation introduces new complexities to this classical problem. In this paper, we study optimal differentially private Principal Component Analysis (PCA) and covariance estimation within the spiked covariance model. We precisely characterize the sensitivity of eigenvalues and eigenvectors under this model and establish the minimax rates of convergence for estimating both the principal components and covariance matrix. These rates hold up to logarithmic factors and encompass general Schatten norms, including spectral norm, Frobenius norm, and nuclear norm as special cases. We propose computationally efficient differentially private estimators and prove their minimax optimality for sub-Gaussian distributions, up to logarithmic factors. Additionally, matching minimax lower bounds are established. Notably, compared to the existing literature, our results accommodate a diverging rank, a broader range of signal strengths, and remain valid even when the sample size is much smaller than the dimension, provided the signal strength is sufficiently strong. Both simulation studies and real data experiments demonstrate the merits of our method.
翻译:协方差矩阵及其相关主成分的估计是当代统计学中的一个基本问题。尽管已发展出具有良好理解性质的最优估计方法,但日益增长的隐私保护需求为这一经典问题带来了新的复杂性。本文研究尖峰协方差模型下的最优差分隐私主成分分析(PCA)与协方差估计。我们精确刻画了该模型下特征值与特征向量的敏感性,并建立了主成分与协方差矩阵估计的极小极大收敛率。这些收敛率在忽略对数因子意义下成立,且涵盖一般的Schatten范数,包括谱范数、Frobenius范数与核范数作为特例。我们提出了计算高效的差分隐私估计量,并证明其在亚高斯分布下达到(忽略对数因子的)极小极大最优性。此外,我们建立了匹配的极小极大下界。值得注意的是,与现有文献相比,我们的结果允许发散秩、更广泛的信号强度范围,且即使在样本量远小于维度的情形下,只要信号强度足够强,结论依然成立。模拟研究与真实数据实验均验证了我们方法的优越性。