Ensemble methods combine the predictions of several base models. We study whether or not including more models always improves their average performance. This question depends on the kind of ensemble considered, as well as the predictive metric chosen. We focus on situations where all members of the ensemble are a priori expected to perform equally well, which is the case of several popular methods such as random forests or deep ensembles. In this setting, we show that ensembles are getting better all the time if, and only if, the considered loss function is convex. More precisely, in that case, the loss of the ensemble is a decreasing function of the number of models. When the loss function is nonconvex, we show a series of results that can be summarised as: ensembles of good models keep getting better, and ensembles of bad models keep getting worse. To this end, we prove a new result on the monotonicity of tail probabilities that may be of independent interest. We illustrate our results on a medical problem (diagnosing melanomas using neural nets) and a "wisdom of crowds" experiment (guessing the ratings of upcoming movies).
翻译:集成方法通过组合多个基模型的预测结果来提升性能。本文研究增加模型数量是否总能提升集成模型的平均性能。该问题的答案取决于所考虑的集成类型以及所选择的预测评估指标。我们重点关注所有集成成员在理论上预期具有同等性能的场景,这正是随机森林和深度集成等多种流行方法所采用的前提。在此设定下,我们证明:当且仅当所考虑的损失函数为凸函数时,集成效果才会随着模型数量的增加而持续改善。更精确地说,在此情况下,集成损失是模型数量的递减函数。当损失函数为非凸函数时,我们通过系列研究得出以下结论:优质模型的集成效果持续提升,而劣质模型的集成效果持续恶化。为此,我们证明了尾部概率单调性的新结论,该结论可能具有独立的研究价值。我们在医学问题(使用神经网络诊断黑色素瘤)和"群体智慧"实验(预测即将上映电影的评分)中验证了上述结论。