DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Differential privacy (DP), as a rigorous mathematical definition quantifying privacy leakage, has become a well-accepted standard for privacy protection. Combined with powerful machine learning techniques, differentially private machine learning (DPML) is increasingly important. As the most classic DPML algorithm, DP-SGD incurs a significant loss of utility, which hinders DPML's deployment in practice. Many studies have recently proposed improved algorithms based on DP-SGD to mitigate utility loss. However, these studies are isolated and cannot comprehensively measure the performance of improvements proposed in algorithms. More importantly, there is a lack of comprehensive research to compare improvements in these DPML algorithms across utility, defensive capabilities, and generalizability. We fill this gap by performing a holistic measurement of improved DPML algorithms on utility and defense capability against membership inference attacks (MIAs) on image classification tasks. We first present a taxonomy of where improvements are located in the machine learning life cycle. Based on our taxonomy, we jointly perform an extensive measurement study of the improved DPML algorithms. We also cover state-of-the-art label differential privacy (Label DP) algorithms in the evaluation. According to our empirical results, DP can effectively defend against MIAs, and sensitivity-bounding techniques such as per-sample gradient clipping play an important role in defense. We also explore some improvements that can maintain model utility and defend against MIAs more effectively. Experiments show that Label DP algorithms achieve less utility loss but are fragile to MIAs. To support our evaluation, we implement a modular re-usable software, DPMLBench, which enables sensitive data owners to deploy DPML algorithms and serves as a benchmark tool for researchers and practitioners.

翻译：差分隐私（DP）作为一种量化隐私泄露的严格数学定义，已成为广泛接受的隐私保护标准。与强大的机器学习技术相结合，差分隐私机器学习（DPML）日益重要。作为最经典的DPML算法，DP-SGD会导致显著的效用损失，这阻碍了DPML在实际中的部署。近年来，许多研究提出了基于DP-SGD的改进算法以缓解效用损失。然而，这些研究彼此孤立，无法全面衡量算法改进的性能。更重要的是，目前缺乏综合性的研究来比较这些DPML算法在效用、防御能力和泛化性方面的改进。我们通过系统性地衡量改进型DPML算法在图像分类任务中的效用和对成员推断攻击（MIA）的防御能力来填补这一空白。我们首先提出了改进措施在机器学习生命周期中所在位置的分类方法。基于此分类法，我们联合开展了对改进型DPML算法的广泛测量研究。评估中还涵盖了最先进的标签差分隐私（Label DP）算法。根据实证结果，差分隐私能有效防御MIA，且每样本梯度裁剪等敏感度约束技术在防御中发挥重要作用。我们还探索了若干既能维持模型效用又能更有效防御MIA的改进方案。实验表明，Label DP算法虽效用损失较小，但对MIA较为脆弱。为支持评估，我们开发了模块化可复用的软件工具DPMLBench，使敏感数据拥有者能够部署DPML算法，并为研究人员和从业者提供基准测试工具。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日