Classical results establish that ensembles of small models benefit when predictive diversity is encouraged, through bagging, boosting, and similar. Here we demonstrate that this intuition does not carry over to ensembles of deep neural networks used for classification, and in fact the opposite can be true. Unlike regression models or small (unconfident) classifiers, predictions from large (confident) neural networks concentrate in vertices of the probability simplex. Thus, decorrelating these points necessarily moves the ensemble prediction away from vertices, harming confidence and moving points across decision boundaries. Through large scale experiments, we demonstrate that diversity-encouraging regularizers hurt the performance of high-capacity deep ensembles used for classification. Even more surprisingly, discouraging predictive diversity can be beneficial. Together this work strongly suggests that the best strategy for deep ensembles is utilizing more accurate, but likely less diverse, component models.
翻译:经典结论表明,通过bagging、boosting等类似方法鼓励预测多样性时,小模型集成会受益。但本文证明,这一直观认识并不适用于用于分类的深度神经网络集成,事实上情况可能恰恰相反。与回归模型或小型(不自信)分类器不同,大型(自信)神经网络产生的预测集中在概率单纯形的顶点处。因此,对这些点去相关必然会迫使集成预测偏离顶点,损害置信度并导致决策边界上的点发生偏移。通过大规模实验,我们发现鼓励多样性的正则化方法会损害用于分类的高容量深度集成的性能。更令人惊讶的是,抑制预测多样性反而可能有益。综上,本文强烈表明,深度集成的最佳策略是使用更准确但可能多样性更低的组件模型。