An open problem in differentially private deep learning is hyperparameter optimization (HPO). DP-SGD introduces new hyperparameters and complicates existing ones, forcing researchers to painstakingly tune hyperparameters with hundreds of trials, which in turn makes it impossible to account for the privacy cost of HPO without destroying the utility. We propose an adaptive HPO method that uses cheap trials (in terms of privacy cost and runtime) to estimate optimal hyperparameters and scales them up. We obtain state-of-the-art performance on 22 benchmark tasks, across computer vision and natural language processing, across pretraining and finetuning, across architectures and a wide range of $\varepsilon \in [0.01,8.0]$, all while accounting for the privacy cost of HPO.
翻译:差分隐私深度学习中的一个开放问题是超参数优化(HPO)。DP-SGD引入了新的超参数并使现有超参数复杂化,迫使研究人员通过数百次试验费力地调整超参数,而这反过来又使得无法在不损害效用的情况下考虑HPO的隐私成本。我们提出了一种自适应HPO方法,该方法使用廉价试验(在隐私成本和运行时间方面)来估计最优超参数并对其进行缩放。我们在22个基准任务上获得了最先进的性能,涵盖计算机视觉和自然语言处理、预训练和微调、各种架构以及广泛的$\varepsilon \in [0.01,8.0]$范围,同时考虑了HPO的隐私成本。