Machine learning is a data-driven field, and the quality of the underlying datasets plays a crucial role in learning success. However, high performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful. This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at hand. To address this issue for datasets where the shortcuts are smaller and more localized than true features, we propose a novel approach to detect and remove them. We use an adversarially trained lens to detect and eliminate highly predictive but semantically unconnected clues in images. In our experiments on both synthetic and real-world data, we show that our proposed approach reliably identifies and neutralizes such shortcuts without causing degradation of model performance on clean data. We believe that our approach can lead to more meaningful and generalizable machine learning models, especially in scenarios where the quality of the underlying datasets is crucial.
翻译:机器学习是一个数据驱动的领域,底层数据集的质量对学习成功起着关键作用。然而,在保留测试数据上取得的高性能并不一定表明模型具有泛化能力或学到了有意义的内容。这通常归因于机器学习快捷特征的存在——数据中具有预测性却与当前问题无关的特征。针对快捷特征比真实特征更小且更局部化的数据集,我们提出了一种新颖的检测与移除方法。我们使用对抗训练构建的透镜来检测并消除图像中高度预测性但语义无关的线索。在合成数据和真实世界数据的实验中,我们证明所提出的方法能够可靠地识别并中和此类快捷特征,且不会导致模型在干净数据上的性能下降。我们相信,该方法有助于构建更具意义和泛化能力的机器学习模型,尤其是在底层数据集质量至关重要的场景中。