Preferential Bayesian optimization (PBO) is a framework for optimizing a decision maker's latent utility function using preference feedback. This work introduces the expected utility of the best option (qEUBO) as a novel acquisition function for PBO. When the decision maker's responses are noise-free, we show that qEUBO is one-step Bayes optimal and thus equivalent to the popular knowledge gradient acquisition function. We also show that qEUBO enjoys an additive constant approximation guarantee to the one-step Bayes-optimal policy when the decision maker's responses are corrupted by noise. We provide an extensive evaluation of qEUBO and demonstrate that it outperforms the state-of-the-art acquisition functions for PBO across many settings. Finally, we show that, under sufficient regularity conditions, qEUBO's Bayesian simple regret converges to zero at a rate $o(1/n)$ as the number of queries, $n$, goes to infinity. In contrast, we show that simple regret under qEI, a popular acquisition function for standard BO often used for PBO, can fail to converge to zero. Enjoying superior performance, simple computation, and a grounded decision-theoretic justification, qEUBO is a promising acquisition function for PBO.
翻译:偏好贝叶斯优化(PBO)是一种利用偏好反馈优化决策者潜在效用函数的框架。本文提出最优选项的期望效用(qEUBO)作为PBO的新型采集函数。当决策者响应无噪声时,我们证明qEUBO是单步贝叶斯最优的,因此等价于流行的知识梯度采集函数。我们还证明,当决策者响应被噪声污染时,qEUBO对单步贝叶斯最优策略具有附加常数近似保证。我们提供了qEUBO的广泛评估,并证明它在多种设置下优于PBO的最先进采集函数。最后,我们表明,在足够的正则性条件下,随着查询次数$n$趋于无穷,qEUBO的贝叶斯简单遗憾以$o(1/n)$的速率收敛至零。相比之下,我们证明在qEI(一种常用于PBO的标准BO采集函数)下的简单遗憾可能无法收敛至零。凭借优越的性能、简单的计算以及扎实的决策论支撑,qEUBO是PBO中一种有前景的采集函数。