Bayesian optimization (BO) is a powerful sequential optimization approach for seeking the global optimum of black-box functions for sample efficiency purposes. Evaluations of black-box functions can be expensive, rendering reduced use of labeled data desirable. For the first time, we introduce a teacher-student model, called $\texttt{TSBO}$, to enable semi-supervised learning that can make use of large amounts of cheaply generated unlabeled data under the context of BO to enhance the generalization of data query models. Our teacher-student model is uncertainty-aware and offers a practical mechanism for leveraging the pseudo labels generated for unlabeled data while dealing with the involved risk. We show that the selection of unlabeled data is key to $\texttt{TSBO}$. We optimize unlabeled data sampling by generating unlabeled data from a dynamically fitted extreme value distribution or a parameterized sampling distribution learned by minimizing the student feedback. $\texttt{TSBO}$ is capable of operating in a learned latent space with reduced dimensionality, providing scalability to high-dimensional problems. $\texttt{TSBO}$ demonstrates the significant sample efficiency in several global optimization tasks under tight labeled data budgets.
翻译:贝叶斯优化(BO)是一种强大的序贯优化方法,旨在高效获取黑箱函数的全局最优解。由于黑箱函数的评估代价高昂,减少标记数据的使用变得尤为重要。本文首次引入了一种名为$\texttt{TSBO}$的师生模型,在贝叶斯优化框架下实现半监督学习,通过充分利用大量低成本生成的无标签数据提升数据查询模型的泛化能力。该师生模型具有不确定性感知能力,并提供了一种实用机制,可在处理相关风险的同时有效利用为无标签数据生成的伪标签。研究表明,无标签数据的选择是$\texttt{TSBO}$模型的关键。我们通过从动态拟合的极值分布或参数化采样分布(通过最小化学生反馈学习得到)生成无标签数据,优化了无标签数据的采样过程。$\texttt{TSBO}$能够在低维度的潜在空间中运行,从而具备处理高维问题的可扩展性。在标记数据预算严格受限的条件下,该模型在多项全局优化任务中展现出显著的样本效率。