For real-world applications of machine learning (ML), it is essential that models make predictions based on well-generalizing features rather than spurious correlations in the data. The identification of such spurious correlations, also known as shortcuts, is a challenging problem and has so far been scarcely addressed. In this work, we present a novel approach to detect shortcuts in image and audio datasets by leveraging variational autoencoders (VAEs). The disentanglement of features in the latent space of VAEs allows us to discover correlations in datasets and semi-automatically evaluate them for ML shortcuts. We demonstrate the applicability of our method on several real-world datasets and identify shortcuts that have not been discovered before. Based on these findings, we also investigate the construction of shortcut adversarial examples.
翻译:机器学习(ML)在实际应用中,模型必须基于具有良好泛化能力的特征进行预测,而非数据中的虚假相关性。识别此类虚假相关性(亦称“捷径”)是一项具有挑战性的问题,且迄今为止鲜有相关研究。本文提出了一种新颖方法,通过利用变分自编码器(VAEs)来检测图像和音频数据集中的捷径。VAE潜空间中特征解耦的能力使我们能够发现数据集中的相关性,并半自动地评估其对ML捷径的影响。我们在多个真实世界数据集上验证了该方法的适用性,并发现了此前未被识别的捷径。基于这些发现,我们还进一步研究了捷径对抗样本的构造。