Modern deep learning techniques focus on extracting intricate information from data to achieve accurate predictions. However, the training datasets may be crowdsourced and include sensitive information, such as personal contact details, financial data, and medical records. As a result, there is a growing emphasis on developing privacy-preserving training algorithms for neural networks that maintain good performance while preserving privacy. In this paper, we investigate the generalization and privacy performances of the differentially private gradient descent (DP-GD) algorithm, which is a private variant of the gradient descent (GD) by incorporating additional noise into the gradients during each iteration. Moreover, we identify a concrete learning task where DP-GD can achieve superior generalization performance compared to GD in training two-layer Huberized ReLU convolutional neural networks (CNNs). Specifically, we demonstrate that, under mild conditions, a small signal-to-noise ratio can result in GD producing training models with poor test accuracy, whereas DP-GD can yield training models with good test accuracy and privacy guarantees if the signal-to-noise ratio is not too small. This indicates that DP-GD has the potential to enhance model performance while ensuring privacy protection in certain learning tasks. Numerical simulations are further conducted to support our theoretical results.
翻译:现代深度学习技术侧重于从数据中提取复杂信息以实现精准预测。然而,训练数据集可能通过众包方式收集,并包含敏感信息,如个人联系方式、财务数据和医疗记录。因此,开发在保护隐私的同时保持良好性能的神经网络隐私保护训练算法日益受到重视。本文研究了差分隐私梯度下降(DP-GD)算法的泛化性与隐私保护性能,该算法通过在每次迭代中向梯度添加额外噪声,成为梯度下降(GD)的隐私保护变体。此外,我们确定了一个具体的学习任务,其中在训练双层Huber化ReLU卷积神经网络(CNNs)时,DP-GD能够获得优于GD的泛化性能。具体而言,我们证明在温和条件下,较小的信噪比会导致GD产生测试精度较差的训练模型,而如果信噪比不过小,DP-GD则能产生具有良好测试精度和隐私保证的训练模型。这表明在某些学习任务中,DP-GD在确保隐私保护的同时具有提升模型性能的潜力。我们进一步通过数值模拟验证了理论结果。