Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question: is DPSGD alone sufficient to find a good minimizer for every dataset under privacy constraints? Towards answering this question, we show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization. In detail, we first show theoretically that there exists an example where without feature preprocessing, DPSGD incurs an optimality gap proportional to the maximum Euclidean norm of features over all samples. We then propose an algorithm called DPSGD-F, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs an optimality gap proportional to the diameter of the features $\max_{x, x' \in D} \|x - x'\|_2$. We finally demonstrate the practicality of our algorithm on image classification benchmarks.
翻译:近年来,使用差分隐私训练机器学习模型受到了越来越多的关注。最流行的差分隐私模型训练算法之一是差分隐私随机梯度下降(DPSGD)及其变体,其中每一步都对梯度进行裁剪并添加噪声。鉴于DPSGD的广泛应用,我们提出一个问题:在隐私约束下,仅靠DPSGD是否足以找到每个数据集上的良好最小值?为了回答这个问题,我们表明,即使在简单的线性分类情况下,与非私有优化不同,(私有)特征预处理对于差分隐私优化至关重要。具体而言,我们首先在理论上证明存在一个例子,在没有特征预处理的情况下,DPSGD会产生与所有样本上特征的最大欧几里得范数成比例的最优性差距。然后,我们提出一种名为DPSGD-F的算法,该算法将DPSGD与特征预处理相结合,并证明对于分类任务,其所产生的最优性差距与特征的直径$\max_{x, x' \in D} \|x - x'\|_2$成比例。最后,我们在图像分类基准上展示了该算法的实用性。