Semantic Change Detection (SCD) of words is an important task for various NLP applications that must make time-sensitive predictions. Some words are used over time in novel ways to express new meanings, and these new meanings establish themselves as novel senses of existing words. On the other hand, Word Sense Disambiguation (WSD) methods associate ambiguous words with sense ids, depending on the context in which they occur. Given this relationship between WSD and SCD, we explore the possibility of predicting whether a target word has its meaning changed between two corpora collected at different time steps, by comparing the distributions of senses of that word in each corpora. For this purpose, we use pretrained static sense embeddings to automatically annotate each occurrence of the target word in a corpus with a sense id. Next, we compute the distribution of sense ids of a target word in a given corpus. Finally, we use different divergence or distance measures to quantify the semantic change of the target word across the two given corpora. Our experimental results on SemEval 2020 Task 1 dataset show that word sense distributions can be accurately used to predict semantic changes of words in English, German, Swedish and Latin.
翻译:词语的语义变化检测(SCD)是各类需要基于时间敏感预测的NLP应用的重要任务。随着时间推移,某些词语会以新颖的方式使用以表达新含义,而这些新含义逐渐成为现有词语的新义项。另一方面,词义消歧(WSD)方法会根据歧义词出现的上下文将其关联到语义标识。鉴于WSD与SCD之间的关联,我们探索通过比较目标词在不同时间节点收集的两个语料库中的义项分布,预测其语义是否发生变化的可能性。为此,我们首先利用预训练的静态词义嵌入自动将语料库中每个目标词实例标注语义标识,随后计算目标词在给定语料库中的义项分布,最后采用不同散度或距离度量方法量化目标词在两个语料库间的语义变化。在SemEval 2020 Task 1数据集上的实验结果表明,词义分布可准确用于预测英语、德语、瑞典语和拉丁语词语的语义变化。