Dynamic contextualised word embeddings (DCWEs) represent the temporal semantic variations of words. We propose a method for learning DCWEs by time-adapting a pretrained Masked Language Model (MLM) using time-sensitive templates. Given two snapshots $C_1$ and $C_2$ of a corpus taken respectively at two distinct timestamps $T_1$ and $T_2$, we first propose an unsupervised method to select (a) \emph{pivot} terms related to both $C_1$ and $C_2$, and (b) \emph{anchor} terms that are associated with a specific pivot term in each individual snapshot. We then generate prompts by filling manually compiled templates using the extracted pivot and anchor terms. Moreover, we propose an automatic method to learn time-sensitive templates from $C_1$ and $C_2$, without requiring any human supervision. Next, we use the generated prompts to adapt a pretrained MLM to $T_2$ by fine-tuning using those prompts. Multiple experiments show that our proposed method reduces the perplexity of test sentences in $C_2$, outperforming the current state-of-the-art.
翻译:动态上下文词嵌入(DCWEs)能够表示词语的时间语义变化。我们提出了一种通过学习时间敏感模板对预训练掩码语言模型(MLM)进行时间自适应来获取动态上下文词嵌入的方法。给定分别在两个不同时间戳$T_1$和$T_2$获取的语料库快照$C_1$和$C_2$,我们首先提出一种无监督方法,用于选择(a)与$C_1$和$C_2$均相关的枢轴词,以及(b)在每个快照中与特定枢轴词相关联的锚点词。接着,我们利用提取的枢轴词和锚点词填充手工编译的模板生成提示。此外,我们还提出一种自动方法,无需人工监督即可从$C_1$和$C_2$中学习时间敏感模板。最后,我们使用生成的提示对预训练MLM进行微调,使其适应$T_2$时间点。多项实验表明,我们提出的方法降低了$C_2$测试句子的困惑度,性能优于当前最先进方法。