Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased predictions which have poor out-of-distribution (OOD) generalization. To address this, we propose a framework that encourages the model to use a more diverse set of features to make predictions. We first train a simple model, and then regularize the conditional mutual information with respect to it to obtain the final model. We demonstrate the effectiveness of this framework in various problem settings and real-world applications, showing that it effectively addresses simplicity bias and leads to more features being used, enhances OOD generalization, and improves subgroup robustness and fairness. We complement these results with theoretical analyses of the effect of the regularization and its OOD generalization properties.
翻译:神经网络(NNs)表现出简单性偏差,即倾向于学习“简单”特征而非更“复杂”的特征,即使后者可能更具信息量。简单性偏差可能导致模型做出有偏预测,从而影响其分布外(OOD)泛化能力。为解决此问题,我们提出一个框架,鼓励模型利用更多样化的特征进行预测。首先训练一个简单模型,然后通过正则化其条件互信息来获得最终模型。我们在多种问题设定和实际应用中验证了该框架的有效性,表明它能有效缓解简单性偏差,促使更多特征被使用,增强OOD泛化能力,并提升子群体鲁棒性与公平性。此外,我们通过理论分析补充了正则化效应及其OOD泛化特性的研究。