Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, specifically in terms of the ratio of positive to negative examples, and class imbalance. In this paper, we investigate an additional specific issue for language models, namely the position bias of positive examples in token classification tasks. Therefore, we conduct an in-depth evaluation of the impact of position bias on the performance of LMs when fine-tuned on Token Classification benchmarks. Our study includes CoNLL03 and OntoNote5.0 for NER, English Tree Bank UD_en and TweeBank for POS tagging. We propose an evaluation approach to investigate position bias in Transformer models. We show that encoders like BERT, ERNIE, ELECTRA, and decoders such as GPT2 and BLOOM can suffer from this bias with an average drop of 3\% and 9\% in their performance. To mitigate this effect, we propose two methods: Random Position Shifting and Context Perturbation, that we apply on batches during the training process. The results show an improvement of $\approx$ 2\% in the performance of the model on CoNLL03, UD_en, and TweeBank.
翻译:语言模型(LMs)在自然语言处理(NLP)任务中展现出最先进的性能。诸如命名实体识别(NER)或词性标注(POS)等下游任务已知存在数据不平衡问题,特别是正负样本比例不均衡及类别不平衡。本文进一步探究了语言模型特有的问题,即分词类任务中正样本的位置偏置。为此,我们对微调于分词类基准任务的语言模型受位置偏置影响的程度进行了深入评估。研究涉及CoNLL03和OntoNote5.0用于NER,以及英语树库UD_en和TweeBank用于POS标注。我们提出了一种评估方法,以探究Transformer模型中的位置偏置。结果表明,BERT、ERNIE、ELECTRA等编码器以及GPT2、BLOOM等解码器均可能受此偏置影响,性能平均下降3%和9%。为缓解该影响,我们提出两种方法:随机位置偏移和上下文扰动,并在训练过程中对批次数据应用这些方法。结果显示,在CoNLL03、UD_en和TweeBank上,模型性能提高了约2%。