For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.
翻译:在存在冲突目标的决策支持场景中,可以向决策者呈现潜在最优解集。我们探讨了这些集合应包含哪些策略,以及如何高效计算此类集合。为此,我们采取分布性方法,提出了一种新的支配准则,该准则直接关联策略的回报分布。基于该准则,我们定义了分布无支配集,并证明其包含被帕累托前沿忽略的最优策略。此外,我们提出了凸分布无支配集,并证明该集合包含所有能使多变量风险厌恶决策者期望效用最大化的策略。我们提出了一种新算法来学习分布无支配集,并进一步贡献了修剪算子,以将该集合缩减为凸分布无支配集。通过实验,我们验证了这些方法的可行性和有效性,为现实问题中的决策支持提供了一种有价值的新方法。