Deep Ensembles (DEs) demonstrate improved accuracy, calibration and robustness to perturbations over single neural networks partly due to their functional diversity. Particle-based variational inference (ParVI) methods enhance diversity by formalizing a repulsion term based on a network similarity kernel. However, weight-space repulsion is inefficient due to over-parameterization, while direct function-space repulsion has been found to produce little improvement over DEs. To sidestep these difficulties, we propose First-order Repulsive Deep Ensemble (FoRDE), an ensemble learning method based on ParVI, which performs repulsion in the space of first-order input gradients. As input gradients uniquely characterize a function up to translation and are much smaller in dimension than the weights, this method guarantees that ensemble members are functionally different. Intuitively, diversifying the input gradients encourages each network to learn different features, which is expected to improve the robustness of an ensemble. Experiments on image classification datasets and transfer learning tasks show that FoRDE significantly outperforms the gold-standard DEs and other ensemble methods in accuracy and calibration under covariate shift due to input perturbations.
翻译:深度集成(DEs)相比单一神经网络,在准确性、校准性和对扰动的鲁棒性方面均展现出提升,其部分原因在于其功能多样性。基于粒子的变分推断(ParVI)方法通过基于网络相似性核形式化排斥项来增强多样性。然而,由于过参数化,权重空间中的排斥效率低下,而直接进行函数空间排斥被证明对深度集成的改进甚微。为规避这些困难,我们提出一阶排斥深度集成(FoRDE),一种基于ParVI的集成学习方法,该方法在一阶输入梯度空间中进行排斥。由于输入梯度唯一地刻画了函数(直至平移变换),且其维度远小于权重,该方法保证了集成成员在功能上存在差异。直观上,多样化的输入梯度鼓励每个网络学习不同的特征,这有望提升集成的鲁棒性。在图像分类数据集和迁移学习任务上的实验表明,在输入扰动导致的协变量偏移情况下,FoRDE在准确性和校准性方面显著优于黄金标准的深度集成和其他集成方法。