Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning from pairwise human feedback, yet existing methods struggle to efficiently optimize beyond low- and medium-dimensional problems due to their global search approaches. We address this limitation by developing a family of local PBO methods that transfer key ideas from high-dimensional BO to the preferential setting. In particular, we introduce local PBO methods which adapt trust-region and derivative-informed local search to pairwise preference feedback, where the latter exploits first- and second-order derivatives of the Laplace-approximated GP posterior. Our benchmark on GP sample paths, standard optimization benchmark functions, and policy-search tasks shows that local PBO methods are especially effective in high-dimensional and complex landscapes with steep optima. Compared with global preference-based baselines, they can substantially reduce cumulative regret, making them particularly useful for real-world preference-based optimization tasks such as policy search.
翻译:贝叶斯优化(BO)是一种用于调优昂贵、含噪实验的流行且有效的方法,但需要显式构建目标函数。偏好贝叶斯优化(PBO)通过从成对人类反馈中学习消除了这一要求,然而现有方法由于采用全局搜索策略,在低维和中维问题之外的优化中难以高效运作。我们通过开发一系列局部PBO方法来解决这一局限,将高维贝叶斯优化的核心思想迁移至偏好设定中。具体而言,我们引入了适配信赖域和基于导数的局部搜索的局部PBO方法以处理成对偏好反馈,后者利用拉普拉斯近似高斯过程后验的一阶和二阶导数。在高斯过程样本路径、标准优化基准函数及策略搜索任务上的基准测试表明,局部PBO方法在具有陡峭最优值的高维复杂场景中尤为有效。与基于全局偏好的基线方法相比,它们能显著降低累积遗憾,使其特别适用于策略搜索等实际偏好优化任务。