Machine learning models are susceptible to a variety of attacks that can erode trust in their deployment. These threats include attacks against the privacy of training data and adversarial examples that jeopardize model accuracy. Differential privacy and randomized smoothing are effective defenses that provide certifiable guarantees for each of these threats, however, it is not well understood how implementing either defense impacts the other. In this work, we argue that it is possible to achieve both privacy guarantees and certified robustness simultaneously. We provide a framework called DP-CERT for integrating certified robustness through randomized smoothing into differentially private model training. For instance, compared to differentially private stochastic gradient descent on CIFAR10, DP-CERT leads to a 12-fold increase in certified accuracy and a 10-fold increase in the average certified radius at the expense of a drop in accuracy of 1.2%. Through in-depth per-sample metric analysis, we show that the certified radius correlates with the local Lipschitz constant and smoothness of the loss surface. This provides a new way to diagnose when private models will fail to be robust.
翻译:机器学习模型容易受到多种攻击,从而削弱其部署中的信任。这些威胁包括针对训练数据隐私的攻击以及危及模型准确性的对抗性样本。差分隐私和随机平滑是有效的防御手段,可为每种威胁提供可认证的保证,然而,实施一种防御如何影响另一种防御尚未得到充分理解。在这项工作中,我们论证了同时实现隐私保证和认证鲁棒性是可能的。我们提出了一个名为DP-CERT的框架,用于将随机平滑的认证鲁棒性集成到差分隐私模型训练中。例如,与在CIFAR10上使用差分隐私随机梯度下降相比,DP-CERT在认证准确性上提升了12倍,平均认证半径增加了10倍,而准确率仅下降1.2%。通过深入的每样本度量分析,我们表明认证半径与局部Lipschitz常数和损失曲面的平滑度相关。这提供了一种诊断私有模型何时将缺乏鲁棒性的新方法。