We initiate the study of differentially private learning in the proportional dimensionality regime, in which the number of data samples $n$ and problem dimension $d$ approach infinity at rates proportional to one another, meaning that $d / n \to \delta$ as $n \to \infty$ for an arbitrary, given constant $\delta \in (0, \infty)$. This setting is significantly more challenging than that of all prior theoretical work in high-dimensional differentially private learning, which, despite the name, has assumed that $\delta = 0$ or is sufficiently small for problems of sample complexity $O(d)$, a regime typically considered "low-dimensional" or "classical" by modern standards in high-dimensional statistics. We provide sharp theoretical estimates of the error of several well-studied differentially private algorithms for robust linear regression and logistic regression, including output perturbation, objective perturbation, and noisy stochastic gradient descent, in the proportional dimensionality regime. The $1 + o(1)$ factor precision of our error estimates enables a far more nuanced understanding of the price of privacy of these algorithms than that afforded by existing, coarser analyses, which are essentially vacuous in the regime we consider. We incorporate several probabilistic tools that have not previously been used to analyze differentially private learning algorithms, such as a modern Gaussian comparison inequality and recent universality laws with origins in statistical physics.
翻译:我们首次研究了比例维度机制下的差分隐私学习问题,在该机制中数据样本数量$n$与问题维度$d$以相互成比例的速率趋近于无穷,即当$n \to \infty$时满足$d / n \to \delta$,其中$\delta \in (0, \infty)$为任意给定常数。这一设定比先前所有高维差分隐私学习的理论工作更具挑战性——尽管冠以"高维"之名,现有研究均假设$\delta = 0$或对样本复杂度为$O(d)$的问题而言足够小,而按照现代高维统计的标准,该机制通常被视为"低维"或"经典"范畴。我们针对比例维度机制下若干经典差分隐私算法(包括输出扰动、目标函数扰动及噪声随机梯度下降)在鲁棒线性回归与逻辑回归任务中的误差给出了精确的理论估计。误差估计中$1 + o(1)$因子的精确度使得我们能够比现有粗糙分析(在我们所考虑的机制中基本无效)更细致地理解这些算法的隐私代价。我们整合了若干先前未用于分析差分隐私学习算法的概率工具,例如现代高斯比较不等式与源自统计物理学的近期普适性定律。