Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained linear classifiers on some easy-to-learn tasks, we uncover two complementary failure modes. These modes arise from how spurious correlations induce two kinds of skews in the data: one geometric in nature, and another, statistical in nature. Finally, we construct natural modifications of image classification datasets to understand when these failure modes can arise in practice. We also design experiments to isolate the two failure modes when training modern neural networks on these datasets.
翻译:实证研究表明,机器学习模型通常依赖于仅在训练期间与标签存在虚假相关的特征(例如背景),导致测试阶段准确率低下。本研究通过解释为何模型在易于学习的任务中仍会出现此类失败(尽管预期这些任务中模型应成功),揭示了导致该行为的根本因素。具体而言,通过对梯度下降训练的线性分类器在若干易于学习任务中的理论分析,我们发现了两种互补的失败模式。这些模式源于虚假相关性如何诱发数据中的两种偏斜:一种本质上是几何性的,另一种本质上是统计性的。最后,我们通过构建图像分类数据集的自然修改版本,以理解这些失败模式在实践中的发生条件,并设计实验以分离现代神经网络在这些数据集训练时出现的两种失败模式。