Bayesian optimization (BO) methods choose sample points by optimizing an acquisition function derived from a statistical model of the objective. These acquisition functions are chosen to balance sampling regions with predicted good objective values against exploring regions where the objective is uncertain. Standard acquisition functions are myopic, considering only the impact of the next sample, but non-myopic acquisition functions may be more effective. In principle, one could model the sampling by a Markov decision process, and optimally choose the next sample by maximizing an expected reward computed by dynamic programming; however, this is infeasibly expensive. More practical approaches, such as rollout, consider a parametric family of sampling policies. In this paper, we show how to efficiently estimate rollout acquisition functions and their gradients, enabling stochastic gradient-based optimization of sampling policies.
翻译:贝叶斯优化(BO)方法通过优化从目标统计模型导出的采集函数来选择采样点。这些采集函数旨在平衡对预测目标值较优区域的采样与对目标不确定区域的探索。标准采集函数是近视的,仅考虑下一个样本的影响,而非近视采集函数可能更为有效。原则上,可以通过马尔可夫决策过程对采样进行建模,并通过动态规划计算期望奖励最大化来最优选择下一个样本;然而,这种方法计算成本过高,难以实现。更实用的方法(如rollout)考虑参数化的采样策略族。本文展示了如何高效估计rollout采集函数及其梯度,从而支持基于随机梯度的采样策略优化。