We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD). In short, generated definitions are used as `senses', and the change score of a target word is retrieved by comparing their distributions in two time periods under comparison. On the material of five datasets and three languages, we show that generated definitions are indeed specific and general enough to convey a signal sufficient to rank sets of words by the degree of their semantic change over time. Our approach is on par with or outperforms prior non-supervised sense-based LSCD methods. At the same time, it preserves interpretability and allows to inspect the reasons behind a specific shift in terms of discrete definitions-as-senses. This is another step in the direction of explainable semantic change modeling.
翻译:本研究利用大型语言模型生成的上下文相关词汇定义作为历时性词汇语义演变检测任务中的语义表征。简而言之,生成的词汇定义被用作“词义单元”,通过比较目标词汇在两个对比时间段的词义分布来获取其语义演变评分。基于涵盖三种语言的五个数据集实验表明,生成的词汇定义既具有特异性又具备普适性,能够有效传递语义演变信号,从而实现对词汇集合按历时语义演变程度的准确排序。本方法在性能上达到或超越了现有无监督基于词义的语义演变检测方法,同时保持了结果的可解释性——允许研究者通过离散的定义化词义单元来检视特定语义迁移的内在动因。这为构建可解释的语义演变模型迈出了重要一步。