Machine learning models are susceptible to a variety of attacks that can erode trust, including attacks against the privacy of training data, and adversarial examples that jeopardize model accuracy. Differential privacy and certified robustness are effective frameworks for combating these two threats respectively, as they each provide future-proof guarantees. However, we show that standard differentially private model training is insufficient for providing strong certified robustness guarantees. Indeed, combining differential privacy and certified robustness in a single system is non-trivial, leading previous works to introduce complex training schemes that lack flexibility. In this work, we present DP-CERT, a simple and effective method that achieves both privacy and robustness guarantees simultaneously by integrating randomized smoothing into standard differentially private model training. Compared to the leading prior work, DP-CERT gives up to a 2.5% increase in certified accuracy for the same differential privacy guarantee on CIFAR10. Through in-depth per-sample metric analysis, we find that larger certifiable radii correlate with smaller local Lipschitz constants, and show that DP-CERT effectively reduces Lipschitz constants compared to other differentially private training methods. The code is available at github.com/layer6ai-labs/dp-cert.
翻译:机器学习模型易受多种攻击影响,这些攻击可能削弱信任,包括针对训练数据隐私的攻击以及危及模型准确性的对抗样本。差分隐私和认证鲁棒性分别是应对这两种威胁的有效框架,因为它们各自提供了面向未来的保证。然而,我们证明标准的差分隐私模型训练不足以提供强大的认证鲁棒性保证。实际上,将差分隐私和认证鲁棒性结合在单一系统中并非易事,导致先前的研究引入了缺乏灵活性的复杂训练方案。在本工作中,我们提出了DP-CERT,这是一种简单而有效的方法,通过将随机平滑集成到标准的差分隐私模型训练中,同时实现隐私和鲁棒性保证。与先前领先的工作相比,在CIFAR10数据集上,DP-CERT在相同的差分隐私保证下,认证准确率提升了高达2.5%。通过深入的逐样本度量分析,我们发现较大的可认证半径与较小的局部Lipschitz常数相关,并表明DP-CERT相比其他差分隐私训练方法能有效降低Lipschitz常数。代码可在github.com/layer6ai-labs/dp-cert获取。