It is challenging to quantify numerical preferences for different objectives in a multi-objective decision-making problem. However, the demonstrations of a user are often accessible. We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations. The algorithm is evaluated in three environments with two baseline methods. Empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system, where one of the agents is enabled to infer the preferences of an opponent using our preference inference algorithm.
翻译:在多目标决策问题中,量化不同目标的数值偏好具有挑战性。然而,用户的示范通常是可获取的。我们提出一种算法,可从最优或近似最优的示范中推断线性偏好权重。该算法在三种环境中与两种基线方法进行了评估。实验结果表明,在推断偏好的时间需求和准确性方面,所提算法相较于基线算法均取得了显著改进。未来工作中,我们计划在多智能体系统中评估该算法的有效性,其中某个智能体将利用我们的偏好推断算法推断对手的偏好。