Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize "adaptive" hyperparameter optimization methods such as Gaussian process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for "adaptive" private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world and synthetic datasets.
翻译:超参数优化(也称为超参数调优)是公认的提升模型性能的常用技术。然而,在训练私有机器学习模型时,许多从业者常常忽视超参数优化所带来的隐私风险,这可能导致底层数据集的敏感信息泄露。目前,唯一实现隐私保护超参数优化的现有方法是在多次运行中均匀随机选择超参数,随后报告性能最佳的配置。相比之下,在非私有场景下,从业者通常采用诸如基于高斯过程优化的"自适应"超参数优化方法,这类方法会根据先前输出的信息选择下一个候选参数。私有与非私有超参数优化之间的这种显著差异凸显了一个关键问题。本文提出DP-HyPO——一种首开先河的"自适应"私有超参数优化框架,旨在弥合私有与非私有超参数优化之间的鸿沟。为此,我们对该框架进行了全面的差分隐私分析。此外,通过一系列真实世界与合成数据集上的实验,我们实证了DP-HyPO的有效性。