Tuning the hyperparameters of differentially private (DP) machine learning (ML) algorithms often requires use of sensitive data and this may leak private information via hyperparameter values. Recently, Papernot and Steinke (2022) proposed a certain class of DP hyperparameter tuning algorithms, where the number of random search samples is randomized itself. Commonly, these algorithms still considerably increase the DP privacy parameter $\varepsilon$ over non-tuned DP ML model training and can be computationally heavy as evaluating each hyperparameter candidate requires a new training run. We focus on lowering both the DP bounds and the computational cost of these methods by using only a random subset of the sensitive data for the hyperparameter tuning and by extrapolating the optimal values to a larger dataset. We provide a R\'enyi differential privacy analysis for the proposed method and experimentally show that it consistently leads to better privacy-utility trade-off than the baseline method by Papernot and Steinke.
翻译:差分隐私机器学习算法的超参数调优通常需要使用敏感数据,这可能通过超参数值泄露隐私信息。近期,Papernot与Steinke(2022)提出了一类特定的差分隐私超参数调优算法,该类算法将随机搜索的样本数量本身随机化。通常,这些算法相较于未调优的差分隐私机器学习模型训练,会显著增加差分隐私参数$\varepsilon$,且由于每次超参数候选值的评估都需要重新训练模型,计算开销较大。本研究致力于通过仅使用敏感数据的随机子集进行超参数调优,并将最优值外推至更大数据集,从而同时降低差分隐私界值和计算成本。我们为所提方法提供了Rényi差分隐私分析,并通过实验证明,该方法在隐私-效用权衡方面始终优于Papernot与Steinke提出的基线方法。