Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, required monotonicity in certain variables, and transitions influencing the accuracy of measurements. Altogether, such transition constraints necessitate a form of planning. This work extends Bayesian optimization via the framework of Markov Decision Processes, iteratively solving a tractable linearization of our objective using reinforcement learning to obtain a policy that plans ahead over long horizons. The resulting policy is potentially history-dependent and non-Markovian. We showcase applications in chemical reactor optimization, informative path planning, machine calibration, and other synthetic examples.
翻译:贝叶斯优化是一种优化黑箱函数的方法。传统上,它侧重于可以任意查询搜索空间的设定。然而,许多实际问题并不具备这种灵活性;特别是,下一次查询的搜索空间可能依赖于之前的查询。在物理科学中,这类挑战表现为局部移动约束、某些变量所需的单调性以及影响测量精度的过渡过程。总之,此类过渡约束需要某种形式的规划。本研究通过马尔可夫决策过程框架扩展贝叶斯优化,利用强化学习迭代求解目标的易处理线性化形式,从而获得能够进行长期前瞻规划的策略。所得策略可能具有历史依赖性和非马尔可夫性。我们展示了在化学反应器优化、信息路径规划、机器校准及其他合成示例中的应用。