Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize ``adaptive'' hyperparameter optimization methods such as Gaussian process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for ``adaptive'' private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world datasets.
翻译:摘要: 超参数优化(又称超参数调优)是一种广泛认可的模型性能提升技术。然而,在训练私有机器学习模型时,许多实践者常忽视超参数优化相关的隐私风险——该过程可能泄露底层数据集的敏感信息。目前,唯一能实现隐私保护超参数优化的现有方法,是均匀随机选择多轮运行中的超参数,并最终报告性能最佳的参数。相比之下,在非私有场景中,实践者通常采用“自适应”超参数优化方法(如基于高斯过程的优化),通过先前输出的信息指导下一候选参数的选择。这种私有与非私有超参数优化间的显著差异突显了关键性问题。本文提出DP-HyPO——一种开创性的“自适应”私有超参数优化框架,旨在弥合私有与非私有超参数优化间的鸿沟。为此,我们对该框架进行了全面的差分隐私分析,并通过多样化的真实世界数据集实证验证了DP-HyPO的有效性。