There has long been plenty of theoretical and empirical evidence supporting the success of ensemble learning. Deep ensembles in particular take advantage of training randomness and expressivity of individual neural networks to gain prediction diversity, ultimately leading to better generalization, robustness and uncertainty estimation. In respect of generalization, it is found that pursuing wider local minima result in models being more robust to shifts between training and testing sets. A natural research question arises out of these two approaches as to whether a boost in generalization ability can be achieved if ensemble learning and loss sharpness minimization are integrated. Our work investigates this connection and proposes DASH - a learning algorithm that promotes diversity and flatness within deep ensembles. More concretely, DASH encourages base learners to move divergently towards low-loss regions of minimal sharpness. We provide a theoretical backbone for our method along with extensive empirical evidence demonstrating an improvement in ensemble generalizability.
翻译:长期以来,大量理论和实验证据支持集成学习的成功。深度集成尤其利用训练随机性和单个神经网络的表达能力来获得预测多样性,最终实现更好的泛化性、鲁棒性和不确定性估计。在泛化性方面,研究发现追求更宽的局部最小值能使模型对训练集与测试集之间的偏移更具鲁棒性。基于这两种方法自然产生一个研究问题:如果将集成学习与损失锐度最小化相结合,是否能够提升泛化能力?我们的工作探讨了这一关联,并提出DASH——一种在深度集成中促进多样性与平坦性的学习算法。具体而言,DASH鼓励基学习器在最小锐度的低损失区域中发散移动。我们为该方法的理论框架提供了支撑,同时通过大量实验证据证明了其对集成泛化能力的提升。