Pre-trained Language Models (PLMs) have shown to be consistently successful in a plethora of NLP tasks due to their ability to learn contextualized representations of words (Ethayarajh, 2019). BERT (Devlin et al., 2018), ELMo (Peters et al., 2018) and other PLMs encode word meaning via textual context, as opposed to static word embeddings, which encode all meanings of a word in a single vector representation. In this work, we present a study that aims to localize where exactly in a PLM word contextualization happens. In order to find the location of this word meaning transformation, we investigate representations of polysemous words in the basic BERT uncased 12 layer architecture (Devlin et al., 2018), a masked language model trained on an additional sentence adjacency objective, using qualitative and quantitative measures.
翻译:预训练语言模型(PLMs)凭借其学习词汇语境化表示的能力,已在众多自然语言处理任务中持续展现出成功(Ethayarajh, 2019)。BERT(Devlin等人,2018)、ELMo(Peters等人,2018)及其他PLMs通过文本语境编码词义,这与静态词嵌入(将词汇的所有含义编码为单一向量表示)形成对比。本研究旨在定位预训练语言模型中词汇语境化发生的具体位置。为探寻这一词义转换发生的位置,我们采用定性与定量方法,调查了多义词在基础BERT无大小写12层架构(Devlin等人,2018)中的表示——该模型为掩码语言模型,并额外以句子相邻性为目标进行训练。