We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.
翻译:我们研究了在差分隐私框架下高维稀疏线性回归模型中的模型选择问题。具体而言,我们考虑了差分隐私最佳子集选择问题并分析了其效用保证。我们采用经典的指数机制来选取最佳模型,并在特定边际条件下证明了其强模型恢复性质。然而,指数机制所需的指数搜索空间造成了严重的计算瓶颈。为克服这一挑战,我们提出了一种用于采样步骤的Metropolis-Hastings算法,并建立了该算法在问题参数$n,p$和$s$下到其平稳分布的多项式混合时间。此外,我们还利用其混合性质证明了Metropolis-Hastings随机游走最终估计的近似差分隐私性。最后,我们进行了若干说明性仿真实验,这些实验结果与主要结论的理论发现相呼应。