Language models (LMs) have achieved notable success in numerous NLP tasks, employing both fine-tuning and in-context learning (ICL) methods. While language models demonstrate exceptional performance, they face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars. Previous research has primarily concentrated on word, phrase, and syntax features, neglecting the concept level, often due to the absence of concept labels and difficulty in identifying conceptual content in input texts. This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data. We find that LMs, when encountering spurious correlations between a concept and a label in training or prompts, resort to shortcuts for predictions. Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations. Our method's efficacy, surpassing traditional token removal approaches, is validated through extensive testing.
翻译:语言模型(LMs)通过微调和上下文学习(ICL)方法在众多自然语言处理任务中取得了显著成功。尽管语言模型展现出卓越性能,但由于训练数据或ICL示例中标签分布不平衡导致的虚假相关性,它们面临鲁棒性挑战。以往研究主要聚焦于词、短语和句法特征,忽视了概念层面,这通常是由于缺乏概念标签且难以识别输入文本中的概念内容所致。本文提出两大贡献:首先,我们利用ChatGPT为文本分配概念标签,评估模型在测试数据上进行微调或ICL时的概念偏差。研究发现,当训练或提示中存在概念与标签之间的虚假相关性时,语言模型会采用捷径进行预测。其次,我们引入一种数据重平衡技术,整合ChatGPT生成的对比事实数据,从而平衡标签分布并缓解虚假相关性。通过广泛测试验证,我们的方法在超越传统标记去除方法方面具有有效性。