Many decision problems cannot be solved exactly and use several estimation algorithms that assign scores to the different available options. The estimation errors can have various correlations, from low (e.g. between two very different approaches) to high (e.g. when using a given algorithm with different hyperparameters). Most aggregation rules would suffer from this diversity of correlations. In this article, we propose different aggregation rules that take correlations into account, and we compare them to naive rules in various experiments based on synthetic data. Our results show that when sufficient information is known about the correlations between errors, a maximum likelihood aggregation should be preferred. Otherwise, typically with limited training data, we recommend a method that we call Embedded Voting (EV).
翻译:许多决策问题无法精确求解,需使用多种估计算法为不同备选方案打分。这些估计误差可能具有多样化的相关性,从低相关性(如两种截然不同的方法之间)到高相关性(如同一种算法使用不同超参数时)不等。大多数聚合规则会受到这种相关性多样性的影响。本文提出多种考虑相关性的聚合规则,并在基于合成数据的多组实验中与朴素规则进行对比。结果表明,当充分掌握误差相关性信息时,应优先采用极大似然聚合方法;而在训练数据有限的一般情况下,我们推荐一种名为嵌入式投票(Embedded Voting, EV)的方法。