CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense Differentiation - 专知论文

会员服务 ·

0

嵌入 · 表示 · 概念对齐 · 对齐 · 微调 ·

CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense Differentiation

翻译：CALE：面向词内义项与跨词义项区分的概念对齐嵌入

Bastien Liétard,Gabriel Loiseau

from arxiv, Accepted at EACL 2026

Lexical semantics is concerned with both the multiple senses a word can adopt in different contexts, and the semantic relations that exist between meanings of different words. To investigate them, Contextualized Language Models are a valuable tool that provides context-sensitive representations that can be used to investigate lexical meaning. Recent works like XL-LEXEME have leveraged the task of Word-in-Context to fine-tune them to get more semantically accurate representations, but Word-in-Context only compares occurrences of the same lemma, limiting the range of captured information. In this paper, we propose an extension, Concept Differentiation, to include inter-words scenarios. We provide a dataset for this task, derived from SemCor data. Then we fine-tune several representation models on this dataset. We call these models Concept-Aligned Embeddings (CALE). By challenging our models and other models on various lexical semantic tasks, we demonstrate that the proposed models provide efficient multi-purpose representations of lexical meaning that reach best performances in our experiments. We also show that CALE's fine-tuning brings valuable changes to the spatial organization of embeddings.

翻译：词汇语义学关注词语在不同语境中呈现的多种义项，以及不同词语含义之间的语义关系。语境化语言模型作为研究词汇意义的重要工具，能够提供语境敏感的表示，从而支持对词汇语义的探究。近期研究如XL-LEXEME通过“上下文词语判别”任务对模型进行微调，以获得语义更精准的表示，但该任务仅比较同一词元的不同实例，限制了所捕获信息的范围。本文提出一种扩展任务——概念区分，以涵盖跨词语的语义场景。我们基于SemCor数据构建了适用于该任务的数据集，并在此数据集上对多种表示模型进行微调。我们将所得模型称为概念对齐嵌入。通过在多种词汇语义任务上测试本模型与其他模型的表现，我们证明所提出的模型能够提供高效的多功能词汇语义表示，在实验中达到最佳性能。此外，我们还发现CALE的微调过程为嵌入的空间组织结构带来了显著且有益的改进。

0

相关内容

零训练开放词汇语义分割综述

零训练开放词汇语义分割综述

专知会员服务

11+阅读 · 2025年5月31日

从词向量到多模态嵌入：大型语言模型的技术、应用及未来方向

从词向量到多模态嵌入：大型语言模型的技术、应用及未来方向

专知会员服务

45+阅读 · 2024年11月11日

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

专知会员服务

41+阅读 · 2023年1月3日

语音识别:不同深度学习方法的综述，Speech Recognition: a review of the different deep learning approaches

语音识别:不同深度学习方法的综述，Speech Recognition: a review of the different deep learning approaches

专知会员服务

33+阅读 · 2022年3月13日

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

专知会员服务

41+阅读 · 2022年2月6日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

专知会员服务

38+阅读 · 2020年3月14日

【ACL 2019 Tutorials】基于图的含义表示:设计和处理（Graph-Based Meaning Representations: Design and Processing），Alexander Koller，Stephan Oepen，孙薇薇

【ACL 2019 Tutorials】基于图的含义表示:设计和处理（Graph-Based Meaning Representations: Design and Processing），Alexander Koller，Stephan Oepen，孙薇薇

专知会员服务

10+阅读 · 2019年11月16日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

【CLL 2019】汉语复合名词短语语义关系知识库构建与自动识别研究

【CLL 2019】汉语复合名词短语语义关系知识库构建与自动识别研究

专知会员服务

17+阅读 · 2019年10月18日

NLP中的词向量对比：word2vec/glove/fastText/elmo/GPT/bert

NLP中的词向量对比：word2vec/glove/fastText/elmo/GPT/bert

AINLP

31+阅读 · 2019年6月1日

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

超详细干货 | 三维语义分割概述及总结

超详细干货 | 三维语义分割概述及总结

计算机视觉life

33+阅读 · 2019年3月19日

论文浅尝 | 区分概念和实例的知识图谱嵌入方法

论文浅尝 | 区分概念和实例的知识图谱嵌入方法

开放知识图谱

17+阅读 · 2019年1月19日

论文浅尝 | 面向跨语言实体对齐的知识图谱与实体描述协同嵌入方法

论文浅尝 | 面向跨语言实体对齐的知识图谱与实体描述协同嵌入方法

开放知识图谱

11+阅读 · 2018年10月4日

深度上下文词向量

深度上下文词向量

微信AI

27+阅读 · 2018年9月13日

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

专知

11+阅读 · 2018年8月4日

Word2Vec与Glove：词嵌入方法的动机和直觉

Word2Vec与Glove：词嵌入方法的动机和直觉

论智

14+阅读 · 2018年6月23日

深度学习 | 利用词嵌入对文本进行情感分析

深度学习 | 利用词嵌入对文本进行情感分析

沈浩老师

11+阅读 · 2017年10月19日

关系推理：基于表示学习和语义要素

关系推理：基于表示学习和语义要素

计算机研究与发展

19+阅读 · 2017年8月22日

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

随机映射框架下的图像语义分析与提取技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

不同加工层次和不同时空尺度下无意识加工之间的相互作用

国家自然科学基金

0+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

中文句子语义概念图自动构建方法及应用研究

国家自然科学基金

3+阅读 · 2014年12月31日

维吾尔语命名实体间语义关系抽取理论方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

基于潜在语义对偶空间的新词翻译自动识别方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking

Arxiv

0+阅读 · 2月19日

Tokens with Meaning: A Hybrid Tokenization Approach for Turkish

Arxiv

0+阅读 · 2月18日

LoGoSeg: Integrating Local and Global Features for Open-Vocabulary Semantic Segmentation

Arxiv

0+阅读 · 2月12日

Improving Interpretability of Lexical Semantic Change with Neurobiological Features

Arxiv

0+阅读 · 2月10日

Reason to Retrieve: Enhancing Query Understanding through Decomposition and Interpretation

Arxiv

0+阅读 · 2月10日

PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

Arxiv

0+阅读 · 2月9日

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval

Arxiv

0+阅读 · 2月5日

A vector logic for intensional formal semantics

Arxiv

0+阅读 · 2月3日

LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

Arxiv

0+阅读 · 2月2日

CASE -- Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement

Arxiv

0+阅读 · 1月23日

VIP会员

文章信息

相关主题

最新内容

美国当前高超音速导弹发展概述

美国当前高超音速导弹发展概述

专知会员服务

1+阅读 · 今天15:03

《高超音速武器：一项再度兴起的技术》120页slides

《高超音速武器：一项再度兴起的技术》120页slides

专知会员服务

2+阅读 · 今天14:33

无人机蜂群建模与仿真方法

无人机蜂群建模与仿真方法

专知会员服务

1+阅读 · 今天14:08

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

专知会员服务

1+阅读 · 今天13:55

《量化反无人机系统对抗无人机蜂群效能的创新方法》

《量化反无人机系统对抗无人机蜂群效能的创新方法》

专知会员服务

2+阅读 · 今天13:53

澳大利亚发布《国防战略（2026年）》

澳大利亚发布《国防战略（2026年）》

专知会员服务

0+阅读 · 今天13:42

【CMU博士论文】迈向基于基础先验的 4D 感知研究

【CMU博士论文】迈向基于基础先验的 4D 感知研究

专知会员服务

0+阅读 · 今天13:46

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

专知会员服务

0+阅读 · 今天13:43

全球高超音速武器最新发展趋势

全球高超音速武器最新发展趋势

专知会员服务

1+阅读 · 今天13:17

《利用大语言模型增强多域作战兵棋推演》（报告）

《利用大语言模型增强多域作战兵棋推演》（报告）

专知会员服务

10+阅读 · 4月18日

《增强准备状态与战备水平：态势感知与数据驱动决策》报告

《增强准备状态与战备水平：态势感知与数据驱动决策》报告

专知会员服务

9+阅读 · 4月18日

中文版《可靠定位、导航与授时 (APNT)：美军相关研发项目》报告

中文版《可靠定位、导航与授时 (APNT)：美军相关研发项目》报告

专知会员服务

8+阅读 · 4月18日

《自主武器系统人类-AI指挥控制中的动态管理》（2026最新450页）

《自主武器系统人类-AI指挥控制中的动态管理》（2026最新450页）

专知会员服务

14+阅读 · 4月18日

美智库《实现空军战斗出动架次生成能力：对目标、差距、障碍与解决方案的审视》（报告）

美智库《实现空军战斗出动架次生成能力：对目标、差距、障碍与解决方案的审视》（报告）

专知会员服务

7+阅读 · 4月18日

《大规模作战行动中争夺情报优势：情报与电子战营-下一代角色探析》（报告）

《大规模作战行动中争夺情报优势：情报与电子战营-下一代角色探析》（报告）

专知会员服务

9+阅读 · 4月18日

相关VIP内容

零训练开放词汇语义分割综述

零训练开放词汇语义分割综述

专知会员服务

11+阅读 · 2025年5月31日

从词向量到多模态嵌入：大型语言模型的技术、应用及未来方向

从词向量到多模态嵌入：大型语言模型的技术、应用及未来方向

专知会员服务

45+阅读 · 2024年11月11日

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

专知会员服务

41+阅读 · 2023年1月3日

语音识别:不同深度学习方法的综述，Speech Recognition: a review of the different deep learning approaches

语音识别:不同深度学习方法的综述，Speech Recognition: a review of the different deep learning approaches

专知会员服务

33+阅读 · 2022年3月13日

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

专知会员服务

41+阅读 · 2022年2月6日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

专知会员服务

38+阅读 · 2020年3月14日

【ACL 2019 Tutorials】基于图的含义表示:设计和处理（Graph-Based Meaning Representations: Design and Processing），Alexander Koller，Stephan Oepen，孙薇薇

【ACL 2019 Tutorials】基于图的含义表示:设计和处理（Graph-Based Meaning Representations: Design and Processing），Alexander Koller，Stephan Oepen，孙薇薇

专知会员服务

10+阅读 · 2019年11月16日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

【CLL 2019】汉语复合名词短语语义关系知识库构建与自动识别研究

【CLL 2019】汉语复合名词短语语义关系知识库构建与自动识别研究

专知会员服务

17+阅读 · 2019年10月18日

热门VIP内容

开通专知VIP会员享更多权益服务

《高超音速武器：一项再度兴起的技术》120页slides

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

美国当前高超音速导弹发展概述

无人机蜂群建模与仿真方法

相关资讯

NLP中的词向量对比：word2vec/glove/fastText/elmo/GPT/bert

NLP中的词向量对比：word2vec/glove/fastText/elmo/GPT/bert

AINLP

31+阅读 · 2019年6月1日

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

超详细干货 | 三维语义分割概述及总结

超详细干货 | 三维语义分割概述及总结

计算机视觉life

33+阅读 · 2019年3月19日

论文浅尝 | 区分概念和实例的知识图谱嵌入方法

论文浅尝 | 区分概念和实例的知识图谱嵌入方法

开放知识图谱

17+阅读 · 2019年1月19日

论文浅尝 | 面向跨语言实体对齐的知识图谱与实体描述协同嵌入方法

论文浅尝 | 面向跨语言实体对齐的知识图谱与实体描述协同嵌入方法

开放知识图谱

11+阅读 · 2018年10月4日

深度上下文词向量

深度上下文词向量

微信AI

27+阅读 · 2018年9月13日

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

专知

11+阅读 · 2018年8月4日

Word2Vec与Glove：词嵌入方法的动机和直觉

Word2Vec与Glove：词嵌入方法的动机和直觉

论智

14+阅读 · 2018年6月23日

深度学习 | 利用词嵌入对文本进行情感分析

深度学习 | 利用词嵌入对文本进行情感分析

沈浩老师

11+阅读 · 2017年10月19日

关系推理：基于表示学习和语义要素

关系推理：基于表示学习和语义要素

计算机研究与发展

19+阅读 · 2017年8月22日

相关论文

Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking

Arxiv

0+阅读 · 2月19日

Tokens with Meaning: A Hybrid Tokenization Approach for Turkish

Arxiv

0+阅读 · 2月18日

LoGoSeg: Integrating Local and Global Features for Open-Vocabulary Semantic Segmentation

Arxiv

0+阅读 · 2月12日

Improving Interpretability of Lexical Semantic Change with Neurobiological Features

Arxiv

0+阅读 · 2月10日

Reason to Retrieve: Enhancing Query Understanding through Decomposition and Interpretation

Arxiv

0+阅读 · 2月10日

PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

Arxiv

0+阅读 · 2月9日

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval

Arxiv

0+阅读 · 2月5日

A vector logic for intensional formal semantics

Arxiv

0+阅读 · 2月3日

LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

Arxiv

0+阅读 · 2月2日

CASE -- Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement

Arxiv

0+阅读 · 1月23日

相关基金

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

随机映射框架下的图像语义分析与提取技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

不同加工层次和不同时空尺度下无意识加工之间的相互作用

国家自然科学基金

0+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

中文句子语义概念图自动构建方法及应用研究

国家自然科学基金

3+阅读 · 2014年12月31日

维吾尔语命名实体间语义关系抽取理论方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

基于潜在语义对偶空间的新词翻译自动识别方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员