Preference-based optimization algorithms are iterative procedures that seek the optimal calibration of a decision vector based only on comparisons between couples of different tunings. At each iteration, a human decision-maker expresses a preference between two calibrations (samples), highlighting which one, if any, is better than the other. The optimization procedure must use the observed preferences to find the tuning of the decision vector that is most preferred by the decision-maker, while also minimizing the number of comparisons. In this work, we formulate the preference-based optimization problem from a utility theory perspective. Then, we propose GLISp-r, an extension of a recent preference-based optimization procedure called GLISp. The latter uses a Radial Basis Function surrogate to describe the tastes of the decision-maker. Iteratively, GLISp proposes new samples to compare with the best calibration available by trading off exploitation of the surrogate model and exploration of the decision space. In GLISp-r, we propose a different criterion to use when looking for new candidate samples that is inspired by MSRS, a popular procedure in the black-box optimization framework. Compared to GLISp, GLISp-r is less likely to get stuck on local optima of the preference-based optimization problem. We motivate this claim theoretically, with a proof of global convergence, and empirically, by comparing the performances of GLISp and GLISp-r on several benchmark optimization problems.
翻译:基于偏好的优化算法是一种迭代过程,仅依据决策者对不同参数配置之间的成对比较结果,来寻找决策向量的最优校准。在每次迭代中,人类决策者需要在两种校准(样本)之间表达偏好,指出其中哪一种(如果有)优于另一种。优化过程必须利用观测到的偏好,以尽可能少的比较次数,找到决策者最偏好的决策向量配置。本文从效用理论视角出发,形式化了基于偏好的优化问题。随后,我们提出了GLISp-r,这是对近期基于偏好的优化方法GLISp的扩展。GLISp采用径向基函数代理模型来刻画决策者的偏好。通过权衡代理模型的利用与决策空间的探索,GLISp迭代地提出新样本与当前最优校准进行比较。在GLISp-r中,我们借鉴了黑箱优化框架中常用的MSRS方法,提出了搜索新候选样本时使用的不同准则。与GLISp相比,GLISp-r更不易陷入基于偏好优化问题的局部最优解。我们通过全局收敛性证明从理论上论证了这一论断,并通过在多个基准优化问题上对比GLISp与GLISp-r的性能进行了实证验证。