Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs - 专知论文

会员服务 ·

0

迁移学习 · Learning · 图 · ngram · 可辨认的 ·

2023 年 5 月 22 日

Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs

翻译：基于多语同词异义图的低资源语言跨语言迁移学习

Yihong Liu,Haotian Ye,Leonie Weissweiler,Hinrich Schütze

Colexification in comparative linguistics refers to the phenomenon of a lexical form conveying two or more distinct meanings. In this paper, we propose simple and effective methods to build multilingual graphs from colexification patterns: ColexNet and ColexNet+. ColexNet's nodes are concepts and its edges are colexifications. In ColexNet+, concept nodes are in addition linked through intermediate nodes, each representing an ngram in one of 1,334 languages. We use ColexNet+ to train high-quality multilingual embeddings $\overrightarrow{\mbox{ColexNet+}}$ that are well-suited for transfer learning scenarios. Existing work on colexification patterns relies on annotated word lists. This limits scalability and usefulness in NLP. In contrast, we identify colexification patterns of more than 2,000 concepts across 1,335 languages directly from an unannotated parallel corpus. In our experiments, we first show that ColexNet has a high recall on CLICS, a dataset of crosslingual colexifications. We then evaluate $\overrightarrow{\mbox{ColexNet+}}$ on roundtrip translation, verse retrieval and verse classification and show that our embeddings surpass several baselines in a transfer learning setting. This demonstrates the benefits of colexification for multilingual NLP.

翻译：比较语言学中的同词异义现象指同一词汇形式表达两个或以上不同含义的语言现象。本文提出基于同词异义模式构建多语图的简洁有效方法：ColexNet和ColexNet+。ColexNet以概念为节点、同词异义关系为边；ColexNet+则在概念节点间增设中间节点，每个节点代表1334种语言中某一语言的n元语法。我们利用ColexNet+训练出高质量多语嵌入向量$\overrightarrow{\mbox{ColexNet+}}$，该向量特别适用于迁移学习场景。现有同词异义研究依赖标注词表，限制了可扩展性及在自然语言处理中的实用性。相比之下，我们直接从未标注平行语料库中识别出1335种语言中2000余个概念的同词异义模式。实验首先证明ColexNet在跨语言同词异义数据集CLICS上具有高召回率。随后我们评估$\overrightarrow{\mbox{ColexNet+}}$在往返翻译、诗歌检索与诗歌分类任务中的表现，显示其嵌入向量在迁移学习场景中超越多项基线，证明了同词异义现象对多语自然语言处理的促进作用。

0

相关内容

迁移学习

迁移学习（Transfer Learning）是一种机器学习方法，是把一个领域（即源领域）的知识，迁移到另外一个领域（即目标领域），使得目标领域能够取得更好的学习效果。迁移学习（TL）是机器学习（ML）中的一个研究问题，着重于存储在解决一个问题时获得的知识并将其应用于另一个但相关的问题。例如，在学习识别汽车时获得的知识可以在尝试识别卡车时应用。尽管这两个领域之间的正式联系是有限的，但这一领域的研究与心理学文献关于学习转移的悠久历史有关。从实践的角度来看，为学习新任务而重用或转移先前学习的任务中的信息可能会显着提高强化学习代理的样本效率。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

HOXB7基因促胃癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同基因型（p53codon72）鼻咽癌细胞放射敏感性差异的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

拜城油鸡H-FABP基因多态性及其mRNA表达量与IMF含量相关性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

高频PWM多机逆变微电网的稳定性分析和能量优化管理

国家自然科学基金

0+阅读 · 2009年12月31日

栉孔扇贝细胞遗传学图谱构建

国家自然科学基金

0+阅读 · 2009年12月31日

肿瘤相关基因甲基化异常与胃癌前病变关系的病例-对照研究

国家自然科学基金

0+阅读 · 2008年12月31日

Evaluating Embedding APIs for Information Retrieval

Arxiv

0+阅读 · 2023年7月6日

A Cross-Chain Query Language for Application-Level Interoperability Between Open and Permissionless Blockchains

Arxiv

0+阅读 · 2023年7月5日

S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning

Arxiv

0+阅读 · 2023年7月5日

Multilingual Controllable Transformer-Based Lexical Simplification

Arxiv

0+阅读 · 2023年7月5日

Learning ECG signal features without backpropagation

Arxiv

0+阅读 · 2023年7月4日

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Arxiv

15+阅读 · 2022年11月29日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Arxiv

15+阅读 · 2019年12月4日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

3+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

3+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

3+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

3+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

3+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

3+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

4+阅读 · 6月22日

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

8+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

21+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Evaluating Embedding APIs for Information Retrieval

Arxiv

0+阅读 · 2023年7月6日

A Cross-Chain Query Language for Application-Level Interoperability Between Open and Permissionless Blockchains

Arxiv

0+阅读 · 2023年7月5日

S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning

Arxiv

0+阅读 · 2023年7月5日

Multilingual Controllable Transformer-Based Lexical Simplification

Arxiv

0+阅读 · 2023年7月5日

Learning ECG signal features without backpropagation

Arxiv

0+阅读 · 2023年7月4日

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Arxiv

15+阅读 · 2022年11月29日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Arxiv

15+阅读 · 2019年12月4日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

相关基金

HOXB7基因促胃癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同基因型（p53codon72）鼻咽癌细胞放射敏感性差异的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

拜城油鸡H-FABP基因多态性及其mRNA表达量与IMF含量相关性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

高频PWM多机逆变微电网的稳定性分析和能量优化管理

国家自然科学基金

0+阅读 · 2009年12月31日

栉孔扇贝细胞遗传学图谱构建

国家自然科学基金

0+阅读 · 2009年12月31日

肿瘤相关基因甲基化异常与胃癌前病变关系的病例-对照研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员