In this pioneering work we formulate ExpM+NF, a method for training machine learning (ML) on private data with pre-specified differentially privacy guarantee $\varepsilon>0, \delta=0$, by using the Exponential Mechanism (ExpM) and an auxiliary Normalizing Flow (NF). We articulate theoretical benefits of ExpM+NF over Differentially Private Stochastic Gradient Descent (DPSGD), the state-of-the-art (SOTA) and de facto method for differentially private ML, and we empirically test ExpM+NF against DPSGD using the SOTA implementation (Opacus with PRV accounting) in multiple classification tasks on the Adult Dataset (census data) and MIMIC-III Dataset (electronic healthcare records) using Logistic Regression and GRU-D, a deep learning recurrent neural network with ~20K-100K parameters. In all experiments, ExpM+NF achieves greater than 93% of the non-private training accuracy (AUC) for $\varepsilon \in [1\mathrm{e}{-3}, 1]$, exhibiting greater accuracy (higher AUC) and privacy (lower $\varepsilon$ with $\delta=0$) than DPSGD. Differentially private ML generally considers $\varepsilon \in [1,10]$ to maintain reasonable accuracy; hence, ExpM+NF's ability to provide strong accuracy for orders of magnitude better privacy (smaller $\varepsilon$) substantially pushes what is currently possible in differentially private ML. Training time results are presented showing ExpM+NF is comparable to (slightly faster) than DPSGD. Code for these experiments will be provided after review. Limitations and future directions are provided.
翻译:在本开创性工作中,我们提出了ExpM+NF方法,该方法通过使用指数机制(ExpM)和辅助归一化流(NF),能够在具有预先指定的差分隐私保证($\varepsilon>0, \delta=0$)下对私有数据进行机器学习训练。我们阐明了ExpM+NF相对于差分隐私随机梯度下降法(DPSGD)——当前差分隐私ML领域最先进(SOTA)且事实上的标准方法——的理论优势,并通过使用SOTA实现(采用PRV核算的Opacus)在多个分类任务上对ExpM+NF与DPSGD进行实证比较。这些任务基于成人数据集(人口普查数据)和MIMIC-III数据集(电子健康记录),使用逻辑回归和GRU-D(一种具有约2万至10万参数的深度学习递归神经网络)。在所有实验中,对于$\varepsilon \in [1\mathrm{e}{-3}, 1]$,ExpM+NF均实现了超过93%的非隐私训练准确率(AUC),展现出比DPSGD更高的准确率(更高的AUC)和更强的隐私保护(更低的$\varepsilon$且$\delta=0$)。差分隐私ML通常考虑$\varepsilon \in [1,10]$以保持合理准确率;因此,ExpM+NF在隐私性提升数个数量级(更小的$\varepsilon$)时仍能保持强准确率,这极大地拓展了当前差分隐私ML的可行性边界。训练时间结果显示,ExpM+NF与DPSGD相当(略快)。实验代码将在评审后提供。文中还讨论了局限性与未来方向。