DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Differential privacy (DP), as a rigorous mathematical definition quantifying privacy leakage, has become a well-accepted standard for privacy protection. Combined with powerful machine learning techniques, differentially private machine learning (DPML) is increasingly important. As the most classic DPML algorithm, DP-SGD incurs a significant loss of utility, which hinders DPML's deployment in practice. Many studies have recently proposed improved algorithms based on DP-SGD to mitigate utility loss. However, these studies are isolated and cannot comprehensively measure the performance of improvements proposed in algorithms. More importantly, there is a lack of comprehensive research to compare improvements in these DPML algorithms across utility, defensive capabilities, and generalizability. We fill this gap by performing a holistic measurement of improved DPML algorithms on utility and defense capability against membership inference attacks (MIAs) on image classification tasks. We first present a taxonomy of where improvements are located in the machine learning life cycle. Based on our taxonomy, we jointly perform an extensive measurement study of the improved DPML algorithms. We also cover state-of-the-art label differential privacy (Label DP) algorithms in the evaluation. According to our empirical results, DP can effectively defend against MIAs, and sensitivity-bounding techniques such as per-sample gradient clipping play an important role in defense. We also explore some improvements that can maintain model utility and defend against MIAs more effectively. Experiments show that Label DP algorithms achieve less utility loss but are fragile to MIAs. To support our evaluation, we implement a modular re-usable software, DPMLBench, which enables sensitive data owners to deploy DPML algorithms and serves as a benchmark tool for researchers and practitioners.

翻译：差分隐私（DP）作为量化隐私泄露的严谨数学定义，已成为广泛接受的隐私保护标准。结合强大的机器学习技术，差分隐私机器学习（DPML）日益重要。作为最经典的DPML算法，DP-SGD会导致显著的效用损失，这阻碍了DPML在实际场景中的部署。近期许多研究基于DP-SGD提出了改进算法以缓解效用损失。然而，这些研究相互孤立，无法全面衡量算法改进的性能。更重要的是，目前缺乏系统性研究来比较这些DPML算法在效用、防御能力和泛化性方面的改进。我们通过全面评估改进型DPML算法在图像分类任务中的效用及对成员推理攻击（MIA）的防御能力来填补这一空白。首先提出改进点位于机器学习生命周期各阶段的分类体系，基于此分类法联合开展大规模测量研究，同时评估了最先进的标签差分隐私（Label DP）算法。实验结果表明：DP能有效防御MIA，其中每样本梯度裁剪等敏感度约束技术在防御中发挥关键作用。我们探索了若干既能维持模型效用又能更有效防御MIA的改进方案。实验显示Label DP算法效用损失更小但对MIA防御脆弱。为支撑评估工作，我们实现了模块化可复用软件DPMLBench，使敏感数据所有者能够部署DPML算法，同时为研究人员和实践者提供基准测试工具。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日