Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning from pairwise human feedback, yet existing methods struggle to efficiently optimize beyond low- and medium-dimensional problems due to their global search approaches. We address this limitation by developing a family of local PBO methods that transfer key ideas from high-dimensional BO to the preferential setting. In particular, we introduce local PBO methods which adapt trust-region and derivative-informed local search to pairwise preference feedback, where the latter exploits first- and second-order derivatives of the Laplace-approximated GP posterior. Our benchmark on GP sample paths, standard optimization benchmark functions, and policy-search tasks shows that local PBO methods are especially effective in high-dimensional and complex landscapes with steep optima. Compared with global preference-based baselines, they can substantially reduce cumulative regret, making them particularly useful for real-world preference-based optimization tasks such as policy search.
翻译:贝叶斯优化(BO)是一种广泛应用于调节昂贵且含噪声实验的有效方法,但需要显式构造目标函数。偏好贝叶斯优化(PBO)通过利用成对人工反馈消除了这一要求,然而现有方法因采用全局搜索策略,难以高效优化中低维问题之外的高维场景。针对这一局限,我们开发了局部PBO方法族,将高维BO的关键思想迁移至偏好设置中。具体而言,我们提出的局部PBO方法将信任域和导数引导的局部搜索适配至成对偏好反馈,其中后者利用拉普拉斯近似高斯过程后验的一阶和二阶导数。在GP样本路径、标准优化基准函数以及策略搜索任务上的实验表明,局部PBO方法在具有陡峭最优值的高维复杂景观中尤为有效。与基于全局偏好的基线方法相比,该方法可显著降低累积遗憾值,特别适用于策略搜索等实际偏好驱动优化任务。