Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question: is DPSGD alone sufficient to find a good minimizer for every dataset under privacy constraints? As a first step towards answering this question, we show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization. In detail, we first show theoretically that there exists an example where without feature preprocessing, DPSGD incurs a privacy error proportional to the maximum norm of features over all samples. We then propose an algorithm called DPSGD-F, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs a privacy error proportional to the diameter of the features $\max_{x, x' \in D} \|x - x'\|_2$. We then demonstrate the practicality of our algorithm on image classification benchmarks.
翻译:近年来,使用差分隐私(DP)训练机器学习模型引起了越来越多的关注。训练差分隐私模型最流行的算法之一是差分隐私随机梯度下降(DPSGD)及其变体,其中每一步的梯度被裁剪并与一些噪声相结合。鉴于DPSGD的日益普及,我们提出疑问:仅使用DPSGD是否足以在隐私约束下为每个数据集找到良好的最小化器?作为回答这个问题的第一步,我们证明即使在简单的线性分类情况下,与非私有优化不同,(私有)特征预处理对于差分隐私优化至关重要。具体来说,我们首先从理论上证明存在一个例子,其中没有特征预处理时,DPSGD产生的隐私误差与所有样本特征的最大范数成正比。然后,我们提出了一种名为DPSGD-F的算法,该算法将DPSGD与特征预处理相结合,并证明对于分类任务,其隐私误差与特征直径$\max_{x, x' \in D} \|x - x'\|_2$成正比。最后,我们在图像分类基准上展示了我们算法的实用性。