Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models' predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models' over-reliance on the shortcut (i.e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data.
翻译:近期研究揭示,深度神经网络常将数据集偏差作为决策捷径而非理解任务,导致在现实应用中失败。本研究聚焦于模型从训练数据的有偏分布中习得的词语特征与标签之间的虚假关联。具体而言,我们将与特定标签高度共现的词语定义为有偏词语,包含有偏词语的样本定义为有偏样本。分析表明,有偏样本更易被模型学习,而在预测时,有偏词语对模型预测的贡献显著更高,且模型倾向于过度依赖词-标签间的虚假关联来分配预测标签。为缓解模型对捷径(即虚假关联)的过度依赖,我们提出一种训练策略——少学捷径(Less-Learn-Shortcut, LLS):该策略量化有偏样本的有偏程度并相应降低其权重。在问题匹配、自然语言推理和情感分析任务上的实验结果表明,LLS是一种任务无关策略,能在保持域内数据良好性能的同时,提升模型在对抗性数据上的表现。