Tuning the hyperparameters of differentially private (DP) machine learning (ML) algorithms often requires use of sensitive data and this may leak private information via hyperparameter values. Recently, Papernot and Steinke (2022) proposed a certain class of DP hyperparameter tuning algorithms, where the number of random search samples is randomized itself. Commonly, these algorithms still considerably increase the DP privacy parameter $\varepsilon$ over non-tuned DP ML model training and can be computationally heavy as evaluating each hyperparameter candidate requires a new training run. We focus on lowering both the DP bounds and the computational cost of these methods by using only a random subset of the sensitive data for the hyperparameter tuning and by extrapolating the optimal values to a larger dataset. We provide a R\'enyi differential privacy analysis for the proposed method and experimentally show that it consistently leads to better privacy-utility trade-off than the baseline method by Papernot and Steinke.
翻译:差分隐私机器学习算法的超参数调优通常需要使用敏感数据,且可能通过超参数值泄露隐私信息。近期Papernot与Steinke(2022)提出一类特定的差分隐私超参数调优算法,其随机搜索样本数量本身具有随机性。通常,这些算法仍会使差分隐私参数ε较未调优的差分隐私机器学习模型训练显著增大,且因评估每个超参数候选需重新训练模型而可能产生高计算开销。本文聚焦于降低这些方法的差分隐私界与计算成本:仅使用敏感数据的随机子集进行超参数调优,并将最优值外推至更大数据集。我们为所提方法提供Rényi差分隐私分析,实验表明该方法在隐私-效用权衡上始终优于Papernot与Steinke的基线方法。