Investigating the Contextualised Word Embedding Dimensions Responsible for Contextual and Temporal Semantic Changes

Words change their meaning over time as well as in different contexts. The sense-aware contextualised word embeddings (SCWEs) such as the ones produced by XL-LEXEME by fine-tuning masked langauge models (MLMs) on Word-in-Context (WiC) data attempt to encode such semantic changes of words within the contextualised word embedding (CWE) spaces. Despite the superior performance of SCWEs in contextual/temporal semantic change detection (SCD) benchmarks, it remains unclear as to how the meaning changes are encoded in the embedding space. To study this, we compare pre-trained CWEs and their fine-tuned versions on contextual and temporal semantic change benchmarks under Principal Component Analysis (PCA) and Independent Component Analysis (ICA) transformations. Our experimental results reveal several novel insights such as (a) although there exist a smaller number of axes that are responsible for semantic changes of words in the pre-trained CWE space, this information gets distributed across all dimensions when fine-tuned, and (b) in contrast to prior work studying the geometry of CWEs, we find that PCA to better represent semantic changes than ICA. Source code is available at https://github.com/LivNLP/svp-dims .

翻译：词语的意义会随时间推移及语境变化而发生改变。通过基于上下文词义（WiC）数据对掩码语言模型（MLM）进行微调产生的感知式语境化词嵌入（SCWE，如XL-LEXEME模型所生成的嵌入），试图在语境化词嵌入（CWE）空间中编码此类词语的语义变化。尽管SCWE在语境/时间语义变化检测（SCD）基准测试中表现出优越性能，但意义变化如何在嵌入空间中被编码仍不明确。为探究此问题，我们在主成分分析（PCA）与独立成分分析（ICA）变换框架下，比较了预训练CWE及其在语境与时间语义变化基准上的微调版本。实验结果揭示了若干新发现：（a）尽管预训练CWE空间中存在少量对词语语义变化起主导作用的坐标轴，但微调后该信息会分散至所有维度；（b）与先前研究CWE几何特性的工作相反，我们发现PCA比ICA能更好地表征语义变化。源代码发布于https://github.com/LivNLP/svp-dims。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日