We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an a~priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown functions and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. The combination of this exploration criterion with a well known Bayesian optimization acquisition function yields a novel safe Bayesian optimization selection criterion. Our approach is naturally applicable to continuous domains and does not require additional explicit hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we learn about the value of the safe optimum up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.
翻译:我们考虑一个序贯决策任务,其目标是在不评估违反先验未知(安全)约束的参数的情况下优化未知函数。一种常见方法是对未知函数施加高斯过程先验,并仅允许在高概率安全的区域进行评估。当前大多数方法依赖于域离散化,无法直接扩展到连续情况。此外,它们利用约束正则性假设的方式引入了一个额外的关键超参数。本文提出一种基于信息论的安全探索准则,该准则直接利用GP后验来识别最具信息量的安全评估参数。将此探索准则与著名的贝叶斯优化采集函数相结合,得到一种新颖的安全贝叶斯优化选择准则。我们的方法天然适用于连续域,且无需额外的显式超参数。我们对该方法进行了理论分析,证明其高概率不违反安全约束,并能以任意精度学习安全最优值。实验评估证明了更高的数据效率和可扩展性。