Automatic semantic change methods try to identify the changes that appear over time in the meaning of words by analyzing their usage in diachronic corpora. In this paper, we analyze different strategies to create static and contextual word embedding models, i.e., Word2Vec and ELMo, on real-world English and Romanian datasets. To test our pipeline and determine the performance of our models, we first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA). Afterward, we focus our experiments on a Romanian dataset, and we underline different aspects of semantic changes in this low-resource language, such as meaning acquisition and loss. The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.
翻译:自动语义变化方法试图通过分析词语在历时语料库中的使用情况,来识别其意义随时间发生的变化。本文分析了在真实世界的英语和罗马尼亚语数据集上构建静态词嵌入与上下文词嵌入模型(即Word2Vec和ELMo)的不同策略。为测试我们的实验流程并评估模型性能,我们首先在英语数据集(SEMEVAL-CCOHA)上对两种词嵌入模型进行了评估。随后,我们将实验重点集中于一个罗马尼亚语数据集,并阐述了这种低资源语言中语义变化的不同方面,例如意义的获得与丧失。实验结果表明,根据语料库的不同,需要重点考虑的因素包括模型的选择以及用于计算语义变化检测得分的距离度量。