Spurious Feature Diversification Improves Out-of-distribution Generalization

Generalization to out-of-distribution (OOD) data is a critical challenge in machine learning. Ensemble-based methods, like weight space ensembles that interpolate model parameters, have been shown to achieve superior OOD performance. However, the underlying mechanism for their effectiveness remains unclear. In this study, we closely examine WiSE-FT, a popular weight space ensemble method that interpolates between a pre-trained and a fine-tuned model. We observe an unexpected phenomenon, in which WiSE-FT successfully corrects many cases where each individual model makes incorrect predictions, which contributes significantly to its OOD effectiveness. To gain further insights, we conduct theoretical analysis in a multi-class setting with a large number of spurious features. Our analysis predicts the above phenomenon and it further shows that ensemble-based models reduce prediction errors in the OOD settings by utilizing a more diverse set of spurious features. Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance. Empirically we demonstrate the effectiveness of utilizing diverse spurious features on a MultiColorMNIST dataset, and our experimental results are consistent with the theoretical analysis. Building upon the new theoretical insights into the efficacy of ensemble methods, we further identify an issue of WiSE-FT caused by the overconfidence of fine-tuned models in OOD situations. This overconfidence magnifies the fine-tuned model's incorrect prediction, leading to deteriorated OOD ensemble performance. To remedy this problem, we propose a novel method called BAlaNced averaGing (BANG), which significantly enhances the OOD performance of WiSE-FT.

翻译：分布外（OOD）数据的泛化是机器学习中的关键挑战。基于集成的方法（如通过插值模型参数的权值空间集成）已被证明能实现卓越的OOD性能，但其有效性的内在机制仍不明确。本研究深入分析了WiSE-FT——一种在预训练模型与微调模型间进行插值的常用权值空间集成方法。我们观察到一种意外现象：WiSE-FT成功纠正了多个单模型预测错误的案例，这对其OOD性能提升贡献显著。为深入探究，我们在包含大量虚假特征的多类场景下进行理论分析。分析不仅预测了上述现象，还揭示基于集成的方法通过利用更丰富的虚假特征集来减少OOD场景下的预测错误。与注重学习不变特征以提升OOD性能的传统认知相反，我们的发现表明：引入大量多样化虚假特征会削弱其个体贡献，从而提升整体OOD泛化性能。我们通过MultiColorMNIST数据集实证验证了利用多样化虚假特征的有效性，实验结果与理论分析相吻合。基于集成方法有效性的新理论洞见，我们进一步发现WiSE-FT在OOD场景中因微调模型过度自信产生的问题——这种过度自信放大了微调模型的错误预测，导致集成性能恶化。针对该问题，我们提出名为BANG（均衡平均）的新方法，显著提升了WiSE-FT的OOD性能。