Tuning all the hyperparameters of differentially private (DP) machine learning (ML) algorithms often requires use of sensitive data and this may leak private information via hyperparameter values. Recently, Papernot and Steinke (2022) proposed a certain class of DP hyperparameter tuning algorithms, where the number of random search samples is randomized itself. Commonly, these algorithms still considerably increase the DP privacy parameter $\varepsilon$ over non-tuned DP ML model training and can be computationally heavy as evaluating each hyperparameter candidate requires a new training run. We focus on lowering both the DP bounds and the computational complexity of these methods by using only a random subset of the sensitive data for the hyperparameter tuning and by extrapolating the optimal values from the small dataset to a larger dataset. We provide a R\'enyi differential privacy analysis for the proposed method and experimentally show that it consistently leads to better privacy-utility trade-off than the baseline method by Papernot and Steinke (2022).
翻译:差分隐私(DP)机器学习算法的超参数调整通常需要使用敏感数据,而超参数值可能泄露隐私信息。近期,Papernot与Steinke(2022)提出了一类DP超参数调整算法,其随机搜索样本的数目本身具有随机性。通常,这类算法仍会显著增加非调优DP机器学习模型训练的隐私参数$\varepsilon$,并且可能因评估每个超参数候选值需要重新训练而计算负担沉重。我们致力于降低这些方法的DP界和计算复杂度,具体方法为:仅使用敏感数据的随机子集进行超参数调整,并将小数据集上的最优值外推至更大数据集。我们为所提方法提供了Rényi差分隐私分析,并通过实验证明,该方法相比于Papernot与Steinke(2022)的基线方法能持续实现更优的隐私-效用权衡。