Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords - 专知论文

会员服务 ·

0

掩码 · 可辨认的 · 目标领域 · 语言模型化 · BERT ·

2023 年 7 月 14 日

Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords

翻译：不要随机掩码：通过掩码领域关键词进行有效的领域自适应预训练

Shahriar Golchin,Mihai Surdeanu,Nazgol Tavabi,Ata Kiapour

from arxiv, final version: accepted at ACL'23 RepL4NLP. arXiv admin note: text overlap with arXiv:2208.12367

We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).

翻译：我们提出了一种新颖的任务无关领域内预训练方法，该方法位于通用预训练和微调之间。我们的方法选择性掩码领域关键词，即能够提供目标领域紧凑表征的词汇。我们利用KeyBERT（Grootendorst, 2020）识别此类关键词。我们通过六种不同设置评估了该方法：三个数据集与两种不同预训练语言模型（PLM）的组合。结果表明，采用领域内预训练策略适应的微调PLM，其性能优于使用随机掩码进行领域内预训练的PLM，以及遵循通用先预训练后微调范式的PLM。此外，识别领域关键词的额外开销合理，例如，针对BERT Large（Devlin等人，2019），该过程仅占用总预训练时间（两轮迭代）的7-15%。

0

相关内容

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Weyl半金属TaAs/NbAs在极端条件下的输运性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Promoting Open-domain Dialogue Generation through Learning Pattern Information between Contexts and Responses

Arxiv

0+阅读 · 2023年9月6日

Enhance Multi-domain Sentiment Analysis of Review Texts through Prompting Strategies

Arxiv

1+阅读 · 2023年9月5日

Analyzing domain shift when using additional data for the MICCAI KiTS23 Challenge

Arxiv

0+阅读 · 2023年9月5日

Geometry-aware Line Graph Transformer Pre-training for Molecular Property Prediction

Arxiv

0+阅读 · 2023年9月1日

Remixing-based Unsupervised Source Separation from Scratch

Arxiv

0+阅读 · 2023年9月1日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Arxiv

10+阅读 · 2019年9月15日

VIP会员

文章信息

相关主题

语言模型化

最新内容

认知战：定义与能力发展

认知战：定义与能力发展

专知会员服务

0+阅读 · 今天9:25

2026年美国防部人工智能政策如何将国防人工智能转向速度、规模与“人工智能优先”作战

2026年美国防部人工智能政策如何将国防人工智能转向速度、规模与“人工智能优先”作战

专知会员服务

1+阅读 · 今天9:09

《伊朗-以色列对抗中的算法化目标选定：技术现实、法律门槛与人类控制的边界》

《伊朗-以色列对抗中的算法化目标选定：技术现实、法律门槛与人类控制的边界》

专知会员服务

1+阅读 · 今天9:04

《红外图像中掩埋目标检测的深度学习方法》2026最新报告

《红外图像中掩埋目标检测的深度学习方法》2026最新报告

专知会员服务

1+阅读 · 今天9:00

《小部队领导者运用新技术训练与制胜指南》2026最新50页

《小部队领导者运用新技术训练与制胜指南》2026最新50页

专知会员服务

1+阅读 · 今天8:23

乌军利用美国“黄蜂”无人机摧毁俄军后勤

乌军利用美国“黄蜂”无人机摧毁俄军后勤

专知会员服务

5+阅读 · 6月7日

《支持作战级人机协同智能的交互式OODA流程》

《支持作战级人机协同智能的交互式OODA流程》

专知会员服务

14+阅读 · 6月7日

《军事地面机动的概率等时分析：未来自适应模型的多方法协同》

《军事地面机动的概率等时分析：未来自适应模型的多方法协同》

专知会员服务

6+阅读 · 6月7日

大语言模型与物联网：大语言模型与物联网融合全面综述

大语言模型与物联网：大语言模型与物联网融合全面综述

专知会员服务

11+阅读 · 6月7日

【伯克利博士论文】基于动作分块策略的强化学习

【伯克利博士论文】基于动作分块策略的强化学习

专知会员服务

5+阅读 · 6月7日

Transformer增强强化学习：通信网络基础与应用综述

Transformer增强强化学习：通信网络基础与应用综述

专知会员服务

5+阅读 · 6月7日

ICML 2026 | SARDI：扩散语言模型的自增强检索

ICML 2026 | SARDI：扩散语言模型的自增强检索

专知会员服务

8+阅读 · 6月6日

长时程具身智能安全综述：机器人操作的跨层分析

长时程具身智能安全综述：机器人操作的跨层分析

专知会员服务

10+阅读 · 6月6日

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

专知会员服务

15+阅读 · 6月6日

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

专知会员服务

10+阅读 · 6月6日

相关VIP内容

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

2026年美国防部人工智能政策如何将国防人工智能转向速度、规模与“人工智能优先”作战

《红外图像中掩埋目标检测的深度学习方法》2026最新报告

认知战：定义与能力发展

《伊朗-以色列对抗中的算法化目标选定：技术现实、法律门槛与人类控制的边界》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

相关论文

Promoting Open-domain Dialogue Generation through Learning Pattern Information between Contexts and Responses

Arxiv

0+阅读 · 2023年9月6日

Enhance Multi-domain Sentiment Analysis of Review Texts through Prompting Strategies

Arxiv

1+阅读 · 2023年9月5日

Analyzing domain shift when using additional data for the MICCAI KiTS23 Challenge

Arxiv

0+阅读 · 2023年9月5日

Geometry-aware Line Graph Transformer Pre-training for Molecular Property Prediction

Arxiv

0+阅读 · 2023年9月1日

Remixing-based Unsupervised Source Separation from Scratch

Arxiv

0+阅读 · 2023年9月1日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Arxiv

10+阅读 · 2019年9月15日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Weyl半金属TaAs/NbAs在极端条件下的输运性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员