Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In this paper, we propose Diverse Weight Averaging (DiWA), a new WA strategy whose main motivation is to increase the functional diversity across averaged models. To this end, DiWA averages weights obtained from several independent training runs: indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between WA and standard functional ensembling. Moreover, this decomposition highlights that WA succeeds when the variance term dominates, which we show occurs when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on DomainBed without inference overhead.
翻译:标准神经网络在面对计算机视觉中的分布偏移时难以泛化。幸运的是,组合多个网络可以持续改善分布外泛化性能。特别地,权重平均策略在竞争性基准DomainBed上表现最佳;尽管存在非线性,该方法直接对多个网络的权重进行平均。本文提出多样化权重平均(Diverse Weight Averaging, DiWA),这是一种新的权重平均策略,其核心动机是增加被平均模型之间的功能多样性。为此,DiWA对从多个独立训练过程中获得的权重进行平均:得益于超参数和训练流程的差异,不同训练过程获得的模型比单次训练过程中收集的模型更具多样性。我们通过一种新的期望误差的偏差-方差-协方差-局部性分解来论证多样性的必要性,该分解利用了权重平均与标准功能集成之间的相似性。此外,该分解揭示出权重平均在方差项占主导时效果最佳,我们证明这种情况在测试时边际分布发生变化时出现。实验表明,DiWA在不增加推理开销的情况下持续改进了DomainBed上的最优性能。