The current state-of-the-art in multi-objective optimization assumes either a given utility function, learns a utility function interactively or tries to determine the complete Pareto front, requiring a post elicitation of the preferred result. However, result elicitation in real world problems is often based on implicit and explicit expert knowledge, making it difficult to define a utility function, whereas interactive learning or post elicitation requires repeated and expensive expert involvement. To mitigate this, we learn a utility function offline, using expert knowledge by means of preference learning. In contrast to other works, we do not only use (pairwise) result preferences, but also coarse information about the utility function space. This enables us to improve the utility function estimate, especially when using very few results. Additionally, we model the occurring uncertainties in the utility function learning task and propagate them through the whole optimization chain. Our method to learn a utility function eliminates the need of repeated expert involvement while still leading to high-quality results. We show the sample efficiency and quality gains of the proposed method in 4 domains, especially in cases where the surrogate utility function is not able to exactly capture the true expert utility function. We also show that to obtain good results, it is important to consider the induced uncertainties and analyze the effect of biased samples, which is a common problem in real world domains.
翻译:当前多目标优化的最先进方法假设给定效用函数、通过交互式学习效用函数或尝试确定完整帕累托前沿,从而需要在事后引导出偏好结果。然而,现实问题中的结果引导通常基于隐性和显性专家知识,难以定义效用函数,而交互式学习或事后引导需要反复且昂贵的专家参与。为缓解这一问题,我们通过偏好学习离线学习效用函数,利用专家知识。与现有工作不同,我们不仅使用(成对)结果偏好,还利用关于效用函数空间的粗略信息。这使我们能够改进效用函数估计,尤其是在仅使用极少结果的情况下。此外,我们建模了效用函数学习任务中出现的各种不确定性,并将其传播到整个优化链中。我们提出的效用函数学习方法消除了重复专家参与的需求,同时仍能产生高质量结果。我们在4个领域展示了该方法的样本效率和质量提升,特别是在替代效用函数无法精确捕捉真实专家效用函数的情况下。我们还表明,要获得良好结果,必须考虑引入的不确定性并分析有偏样本的影响,而这正是现实世界领域的常见问题。