Language models (LMs) have gained great achievement in various NLP tasks for both fine-tuning and in-context learning (ICL) methods. Despite its outstanding performance, evidence shows that spurious correlations caused by imbalanced label distributions in training data (or exemplars in ICL) lead to robustness issues. However, previous studies mostly focus on word- and phrase-level features and fail to tackle it from the concept level, partly due to the lack of concept labels and subtle and diverse expressions of concepts in text. In this paper, we first use the LLM to label the concept for each text and then measure the concept bias of models for fine-tuning or ICL on the test data. Second, we propose a data rebalancing method to mitigate the spurious correlations by adding the LLM-generated counterfactual data to make a balanced label distribution for each concept. We verify the effectiveness of our mitigation method and show its superiority over the token removal method. Overall, our results show that there exist label distribution biases in concepts across multiple text classification datasets, and LMs will utilize these shortcuts to make predictions in both fine-tuning and ICL methods.
翻译:语言模型(LMs)在微调和上下文学习(ICL)方法中均已在各类自然语言处理任务上取得显著成就。尽管其表现卓越,但有证据表明,由训练数据(或ICL中的示例)中标签分布不均衡所导致的虚假相关性会引发鲁棒性问题。然而,以往研究主要关注词级和短语级特征,未能从概念层面解决该问题,部分原因在于缺乏概念标签以及文本中概念表达的微妙性和多样性。本文首先利用大语言模型(LLM)为每个文本标注概念,然后在测试数据上衡量微调或ICL模型中存在的概念偏差。其次,我们提出一种数据再平衡方法,通过添加LLM生成的反事实数据来构建每个概念下均衡的标签分布,从而缓解虚假相关性。我们验证了该缓解方法的有效性,并证明其优于标记移除方法。总体而言,研究结果表明,多个文本分类数据集中的概念层面均存在标签分布偏差,且语言模型会在微调和ICL方法中利用这些捷径进行预测。