Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we cannot simply enforce invariance to next-lane speed, since it could provide valuable information about an unobservable pedestrian at a crosswalk. Thus, universally ignoring features that are sometimes (but not always) reliable can lead to non-robust performance. We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context. We propose and analyze a two-stage framework called Explicit Non-spurious feature Prediction (ENP) which first identifies the relevant features to use for a given context, then trains a model to rely exclusively on these features. Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability.
翻译:深度神经网络常常因依赖虚假相关性而灾难性失败。现有大多数研究假设虚假特征与可靠特征之间存在明确二分法,但这种假设通常不切实际。例如,在大多数情况下,我们并不希望自动驾驶汽车简单复制邻车道车辆的速度——我们不愿看到车辆因邻车闯红灯而效仿。然而,我们也不能强行要求对邻车道速度保持不变性,因为该信息可能为不可见的横穿行人提供重要线索。因此,普遍忽略那些有时(但并非始终)可靠的特征可能导致非鲁棒性能。我们形式化了一个称为“上下文可靠性”的新框架,该框架考虑了“正确”特征可能因上下文而变化的事实。我们提出并分析了一种名为“显式非虚假特征预测(ENP)”的两阶段框架:该框架首先针对给定上下文识别需使用的相关特征,随后训练模型仅依赖这些特征。我们的研究从理论和实证两方面证明了ENP相对于现有方法的优势,并为上下文可靠性提供了新的基准。