Using token representation from bidirectional language models (LMs) such as BERT is still a widely used approach for token-classification tasks. Even though there exist much larger unidirectional LMs such as Llama-2, they are rarely used to replace the token representation of bidirectional LMs. In this work, we hypothesize that their lack of bidirectionality is keeping them behind. To that end, we propose to newly train a small backward LM and concatenate its representations to those of existing LM for downstream tasks. Through experiments in named entity recognition, we demonstrate that introducing backward model improves the benchmark performance more than 10 points. Furthermore, we show that the proposed method is especially effective for rare domains and in few-shot learning settings.
翻译:使用双向语言模型(如BERT)的标记表示仍然是标记分类任务中广泛采用的方法。尽管存在Llama-2等规模更大的单向语言模型,但它们很少被用于替代双向语言模型的标记表示。本研究假设其缺乏双向性是导致这一现象的主要原因。为此,我们提出通过新训练一个小型反向语言模型,并将其表示与现有语言模型的表示进行拼接,以应用于下游任务。通过在命名实体识别任务上的实验,我们证明引入反向模型能使基准性能提升超过10个百分点。此外,我们发现所提方法在稀有领域和少样本学习场景中尤为有效。