For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.
翻译:在存在冲突目标的决策场景中,为了提供有效的决策支持,可以向决策者呈现潜在最优解的集合。我们探讨了这些集合应包含哪些策略,以及如何高效计算这些集合。基于此,我们采用分布性方法,引入了一种直接比较策略回报分布的新型优势准则。依据该准则,我们提出了分布非支配集,并证明该集合包含了被帕累托前沿忽略的最优策略。此外,我们提出了凸分布非支配集,并证明其包含所有能够最大化多变量风险厌恶决策者期望效用的策略。我们提出了一种新型算法来学习分布非支配集,并进一步贡献了剪枝算子以将该集合缩减为凸分布非支配集。通过实验,我们验证了这些方法的可行性与有效性,由此为现实问题中的决策支持提供了一种富有价值的新方法。