Differentially Private Stochastic Gradient Descent (DP-SGD) limits the amount of private information deep learning models can memorize during training. This is achieved by clipping and adding noise to the model's gradients, and thus networks with more parameters require proportionally stronger perturbation. As a result, large models have difficulties learning useful information, rendering training with DP-SGD exceedingly difficult on more challenging training tasks. Recent research has focused on combating this challenge through training adaptations such as heavy data augmentation and large batch sizes. However, these techniques further increase the computational overhead of DP-SGD and reduce its practical applicability. In this work, we propose using the principle of sparse model design to solve precisely such complex tasks with fewer parameters, higher accuracy, and in less time, thus serving as a promising direction for DP-SGD. We achieve such sparsity by design by introducing equivariant convolutional networks for model training with Differential Privacy. Using equivariant networks, we show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements. On CIFAR-10, we achieve an increase of up to $9\%$ in accuracy while reducing the computation time by more than $85\%$. Our results are a step towards efficient model architectures that make optimal use of their parameters and bridge the privacy-utility gap between private and non-private deep learning for computer vision.
翻译:差分隐私随机梯度下降(DP-SGD)通过裁剪和添加噪声来限制深度学习模型在训练过程中记忆的私有信息量,因此参数更多的网络需要成比例地增强扰动。这导致大型模型难以学习有用信息,使得在更具挑战性的训练任务中使用DP-SGD变得异常困难。近期研究通过训练适应性调整(如重度数据增强和大批量大小)来应对这一挑战。然而,这些技术进一步增加了DP-SGD的计算开销并降低了其实用性。在本工作中,我们提出利用稀疏模型设计原则,以更少的参数、更高的精度和更短的时间精确解决此类复杂任务,从而为DP-SGD提供了有前景的发展方向。我们通过引入等变卷积网络进行差分隐私模型训练,在设计中实现了这种稀疏性。利用等变网络,我们证明了小型高效架构设计能够显著降低计算需求并超越当前最先进模型。在CIFAR-10数据集上,我们在计算时间减少超过85%的同时实现了高达9%的准确率提升。我们的研究成果朝着构建高效利用参数的模型架构迈出一步,并弥合了计算机视觉中私有深度学习与非私有深度学习之间的隐私-效用差距。