Semantic Shift Detection (SSD) is the task of identifying, interpreting, and assessing the possible change over time in the meanings of a target word. Traditionally, SSD has been addressed by linguists and social scientists through manual and time-consuming activities. In the recent years, computational approaches based on Natural Language Processing and word embeddings gained increasing attention to automate SSD as much as possible. In particular, over the past three years, significant advancements have been made almost exclusively based on word contextualised embedding models, which can handle the multiple usages/meanings of the words and better capture the related semantic shifts. In this paper, we survey the approaches based on contextualised embeddings for SSD (i.e., CSSDetection) and we propose a classification framework characterised by meaning representation, time-awareness, and learning modality dimensions. The framework is exploited i) to review the measures for shift assessment, ii) to compare the approaches on performance, and iii) to discuss the current issues in terms of scalability, interpretability, and robustness. Open challenges and future research directions about CSSDetection are finally outlined.
翻译:语义变迁检测(SSD)是指识别、解释和评估目标词语义随时间可能发生变化的任务。传统上,SSD一直由语言学家和社会科学家通过人工耗时的方式进行研究。近年来,基于自然语言处理和词嵌入的计算方法日益受到关注,旨在尽可能实现SSD的自动化。特别是在过去三年中,相关研究几乎完全基于词语境化嵌入模型取得了显著进展,这类模型能够处理词语的多种用法/含义,并更好地捕捉相关的语义变迁。本文系统综述了基于语境化嵌入的SSD方法(即CSSDetection),提出了一个以语义表征、时间感知和学习模式为维度的分类框架。该框架被用于:一)梳理语义变迁的评估指标;二)比较不同方法的性能表现;三)从可扩展性、可解释性和鲁棒性角度探讨当前存在的问题。最后,本文展望了CSSDetection领域面临的开放挑战与未来研究方向。