Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional orthogonal subspaces in the neural network representation. We evaluate the algorithm on benchmark datasets for computer vision (Waterbirds, CelebA) and natural language processing (MultiNLI), and show that it outperforms existing concept removal methods
翻译:神经网络在分布外泛化中常因虚假相关性而受限。常见的缓解策略是从数据表示中移除虚假概念。现有概念移除方法往往过于激进,会无意中消除与模型主任务相关的特征,从而损害模型性能。我们提出了一种迭代算法,通过联合识别神经网络表示中的两个低维正交子空间,将虚假概念与主任务概念分离。我们在计算机视觉(Waterbirds、CelebA)和自然语言处理(MultiNLI)基准数据集上对该算法进行了评估,结果显示其性能优于现有概念移除方法。