Ensemble methods combine the predictions of several base models. We study whether or not including more models always improves their average performance. This question depends on the kind of ensemble considered, as well as the predictive metric chosen. We focus on situations where all members of the ensemble are a priori expected to perform as well, which is the case of several popular methods such as random forests or deep ensembles. In this setting, we show that ensembles are getting better all the time if, and only if, the considered loss function is convex. More precisely, in that case, the average loss of the ensemble is a decreasing function of the number of models. When the loss function is nonconvex, we show a series of results that can be summarised as: ensembles of good models keep getting better, and ensembles of bad models keep getting worse. To this end, we prove a new result on the monotonicity of tail probabilities that may be of independent interest. We illustrate our results on a medical prediction problem (diagnosing melanomas using neural nets) and a "wisdom of crowds" experiment (guessing the ratings of upcoming movies).
翻译:集成方法结合多个基模型的预测。我们研究纳入更多模型是否总能提升其平均性能。该问题取决于所考虑的集成类型及所选预测指标。我们重点关注所有集成成员先验预期表现相当的情形——这正是随机森林、深度集成等热门方法的典型特征。在此设定下,我们证明:集成方法当且仅当损失函数为凸函数时才会持续改进。具体而言,此时集成的平均损失随模型数量增加而递减。当损失函数非凸时,我们得出一系列可总结为“好模型集成持续优化,差模型集成持续恶化”的结论。为此,我们证明了关于尾概率单调性的新结论,该结论本身可能具有独立研究价值。我们通过医学预测问题(利用神经网络诊断黑色素瘤)和“群体智慧”实验(预测即将上映电影的评分)验证了上述结论。