This paper proposes new methodologies for conducting practical differentially private (DP) estimation and inference in high-dimensional linear regression. We first introduce a DP Bayesian Information Criterion (DP-BIC) for selecting the unknown sparsity parameter in differentially private sparse linear regression (DP-SLR), eliminating the need for prior knowledge of model sparsity, which is a requisite in the existing literature. Next, we develop the DP debiased algorithm that enables privacy-preserving inference on a particular subset of regression parameters. Our proposed method enables privacy-preserving inference on the regression parameters by leveraging the inherent sparsity of high-dimensional linear regression models. Additionally, we address private feature selection by considering multiple testing in high-dimensional linear regression by introducing a DP multiple testing procedure that controls the false discovery rate (FDR). This allows for accurate and privacy-preserving identification of significant predictors in the regression model. Through extensive simulations and real data analyses, we demonstrate the effectiveness of our proposed methods in conducting inference for high-dimensional linear models while safeguarding privacy and controlling the FDR.
翻译:本文提出了在高维线性回归中实现实用的差分隐私(DP)估计与推断的新方法。首先,我们引入差分隐私贝叶斯信息准则(DP-BIC),用于选择差分隐私稀疏线性回归(DP-SLR)中未知的稀疏性参数,从而消除了对现有文献中模型先验稀疏性知识的依赖。其次,我们开发了DP去偏算法,能够对特定回归参数子集进行隐私保护推断。通过利用高维线性回归模型固有的稀疏性,该方法实现了对回归参数的隐私保护推断。此外,我们通过引入控制错误发现率(FDR)的DP多重检验程序,解决了高维线性回归中的隐私特征选择问题。该程序能够在保护隐私的同时,准确识别回归模型中的显著预测变量。通过大量模拟实验和真实数据分析,我们证明了所提方法在保障隐私并控制FDR的前提下,对高维线性模型进行推断的有效性。