Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights (model soups). However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when weights are similar enough (in weight or feature space) to average well but different enough to benefit from combining them. Based on this idea, we propose PopulAtion Parameter Averaging (PAPA): a method that combines the generality of ensembling with the efficiency of weight averaging. PAPA leverages a population of diverse models (trained on different data orders, augmentations, and regularizations) while occasionally (not too often, not too rarely) replacing the weights of the networks with the population average of the weights. PAPA reduces the performance gap between averaging and ensembling, increasing the average accuracy of a population of models by up to 1.1% on CIFAR-10, 2.4% on CIFAR-100, and 1.9% on ImageNet when compared to training independent (non-averaged) models.
翻译:集成方法通过结合多个模型的预测来提升性能,但在推理时需显著增加计算成本。为规避这一成本,可将多个神经网络的权重进行平均(模型汤)以合并为一个网络。然而,该方法通常性能显著低于集成。权重平均仅在权重(在权重空间或特征空间中)足够相似以实现良好平均、同时又具备足够差异性以从合并中获益时才有益。基于这一思想,我们提出种群参数平均(PAPA):一种将集成的泛化性与权重平均的高效性相结合的方法。PAPA利用一组多样化模型(基于不同数据顺序、数据增强和正则化训练),并偶尔(不过于频繁也不过少)将网络权重替换为种群权重的平均值。PAPA缩小了平均与集成之间的性能差距,与独立训练(非平均)模型相比,在CIFAR-10、CIFAR-100和ImageNet上分别将模型种群的平均准确率提升高达1.1%、2.4%和1.9%。