Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. In this work, we detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic model training setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and classic multi-task BO benchmarks.
翻译:贝叶斯优化已成为对昂贵真实世界函数进行全局优化的常用策略。与贝叶斯优化适用于黑箱函数优化的普遍预期相反,实际部署贝叶斯优化需要相关函数的领域知识。这种领域知识通常以高斯过程先验的形式体现,用于指定函数的初始信念。然而,即使具备专家经验,定量定义先验分布也非易事,尤其是在复杂机器学习模型的超参数调优问题中,调优目标函数的景观往往难以理解。我们探索了一种设置函数先验的替代方案:当存在相似函数的可用数据时,可预先训练更紧凑的先验分布。本文详细阐述了基于KL散度损失函数的高斯过程预训练方法,并提出基于预训练的贝叶斯优化框架HyperBO。理论层面,我们证明了在不假设已知"真实"高斯过程先验的条件下,HyperBO的后验预测具有有界性且遗憾值趋近于零。为验证该方法在真实模型训练场景中的效果,我们通过在流行图像、文本数据集及蛋白质序列数据集上训练数万个近最优深度学习模型配置,构建了大规模多任务超参数调优数据集。实验结果表明,无论是在新构建的调优数据集还是经典多任务贝叶斯优化基准测试中,HyperBO定位优质超参数的效率平均比最优竞争方法高3倍以上。