We consider the problem of sequentially maximising an unknown function over a set of actions while ensuring that every sampled point has a function value below a given safety threshold. We model the function using kernel-based and Gaussian process methods, while differing from previous works in our assumption that the function is monotonically increasing with respect to a \emph{safety variable}. This assumption is motivated by various practical applications such as adaptive clinical trial design and robotics. Taking inspiration from the \textsc{\sffamily GP-UCB} and \textsc{\sffamily SafeOpt} algorithms, we propose an algorithm, monotone safe {\sffamily UCB} (\textsc{\sffamily M-SafeUCB}) for this task. We show that \textsc{\sffamily M-SafeUCB} enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary. In addition, we illustrate that the monotonicity assumption yields significant benefits in terms of the guarantees obtained, as well as algorithmic simplicity and efficiency. We support our theoretical findings by performing empirical evaluations on a variety of functions, including a simulated clinical trial experiment.
翻译:我们考虑在确保每个采样点的函数值低于给定安全阈值的前提下,在动作集合上序贯最大化未知函数的问题。我们采用基于核函数和高斯过程的方法对函数进行建模,但与前人工作不同,我们假设函数相对于安全变量是单调递增的。这一假设源于自适应临床试验设计和机器人技术等实际应用场景。受GP-UCB和SafeOpt算法的启发,我们提出了一种名为单调安全UCB(M-SafeUCB)的算法。我们证明M-SafeUCB在安全性、适当定义的遗憾概念以及近似寻找整个安全边界方面具有理论保证。此外,我们阐明单调性假设在获取更优的理论保证、算法简洁性和效率方面均带来显著优势。我们通过在多种函数上(包括模拟临床试验实验)进行实证评估来支持我们的理论发现。