Recent research has revealed that deep learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances. For instance, a sentiment classifier may erroneously learn that the token performances is commonly associated with positive movie reviews. Relying on these spurious correlations degrades the classifiers performance when it deploys on out-of-distribution data. In this paper, we examine the implications of spurious correlations through a novel perspective called neighborhood analysis. The analysis uncovers how spurious correlations lead unrelated words to erroneously cluster together in the embedding space. Driven by the analysis, we design a metric to detect spurious tokens and also propose a family of regularization methods, NFL (doN't Forget your Language) to mitigate spurious correlations in text classification. Experiments show that NFL can effectively prevent erroneous clusters and significantly improve the robustness of classifiers.
翻译:近期研究表明,深度学习模型倾向于利用训练集中存在但在一般情况下不成立的虚假相关性。例如,情感分类器可能错误地学习到"表演"一词通常与正面影评相关联。依赖这些虚假相关性会导致分类器在应用于分布外数据时性能下降。本文通过一种称为邻域分析的新视角来考察虚假相关性的影响。该分析揭示了虚假相关性如何导致无关词语在嵌入空间中错误地聚集在一起。基于这一分析,我们设计了一种检测虚假标记的度量方法,并提出了一系列正则化方法——NFL(勿忘语言特性),以缓解文本分类中的虚假相关性。实验表明,NFL能够有效防止错误聚类,并显著提升分类器的鲁棒性。