Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE - 专知论文

会员服务 ·

0

GLUE · Bagging · MoDELS · 语言模型化 · 可理解性 ·

2023 年 2 月 18 日

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

翻译：有效语言模型预训练与下游适配的技巧集锦：以GLUE为例的案例研究

Qihuang Zhong,Liang Ding,Keqin Peng,Juhua Liu,Bo Du,Li Shen,Yibing Zhan,Dacheng Tao

from arxiv, Technical report. arXiv admin note: text overlap with arXiv:2212.01853

This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. [Method] We investigate several effective strategies and choose their best combination setting as the training recipes. As for model structure, we employ the vanilla Transformer with disentangled attention as the basic block encoder. For self-supervised training, we employ the representative denoising objective (i.e., replaced token detection) in phase 1 and combine the contrastive objective (i.e., sentence embedding contrastive learning) with it in phase 2. During fine-tuning, several advanced techniques such as transductive fine-tuning, self-calibrated fine-tuning, and adversarial fine-tuning are adopted. [Results] According to our submission record (Jan. 2022), with our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3. Encouragingly, our Vega v1 is the first to exceed powerful human performance on the two challenging tasks, i.e., SST-2 and WNLI. We believe our empirically successful recipe with a bag of tricks could shed new light on developing efficient discriminative large language models.

翻译：本技术报告简要描述了我们的JDExplore d团队在通用语言理解评估（GLUE）排行榜上的提交成果Vega v1。GLUE包含九项自然语言理解任务，涵盖问答、语言可接受性、情感分析、文本相似度、释义检测及自然语言推理。[方法]我们探索了多种有效策略，并选取其最佳组合配置作为训练方案。模型结构方面，采用基于解耦注意力的标准Transformer作为基本块编码器。自监督训练中，第一阶段使用代表性去噪目标（即替代词检测），第二阶段将其与对比学习目标（即句子嵌入对比学习）结合。微调阶段采用了直推式微调、自校准微调及对抗性微调等先进技术。[结果]根据我们2022年1月的提交记录，通过优化预训练与微调策略，我们的13亿参数模型在4/9任务上创下新最优结果，平均得分达91.3。值得注意的是，Vega v1首次在SST-2和WNLI两项具有挑战性的任务上超越强大的人类表现。我们相信，这套经验验证有效的技巧集锦将为开发高效判别式大语言模型提供新思路。

0

相关内容

GLUE

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

SMAD2调控ERK通路干预M2巨噬细胞活化在糖尿病肾病小鼠肾脏纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

neuritin抑制神经炎症改善糖尿病中枢神经病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

雷公藤多苷联合小檗碱预防和治疗2型糖尿病肾小管间质病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

腺嘌呤去甲基化酶FTO对2型糖尿病糖脂代谢关键基因的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

风暴潮灾害脆弱性测度及损失补偿对策研究

国家自然科学基金

0+阅读 · 2012年12月31日

弱光非线性光子学材料的缺陷设计与构筑

国家自然科学基金

0+阅读 · 2012年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Continual Pre-training of Language Models

Arxiv

0+阅读 · 2023年4月12日

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Arxiv

0+阅读 · 2023年4月11日

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

Arxiv

0+阅读 · 2023年4月10日

Training Language Models with Language Feedback at Scale

Arxiv

0+阅读 · 2023年4月9日

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Arxiv

0+阅读 · 2023年4月8日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

VIP会员

文章信息

相关主题

语言模型化

最新内容

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

5+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

3+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

3+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

5+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

4+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

重新思考无人机时代的生存能力

在人工智能加速决策环境中拓展OODA循环

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

装甲突击旅：现代战争思考、战斗与组织

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Continual Pre-training of Language Models

Arxiv

0+阅读 · 2023年4月12日

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Arxiv

0+阅读 · 2023年4月11日

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

Arxiv

0+阅读 · 2023年4月10日

Training Language Models with Language Feedback at Scale

Arxiv

0+阅读 · 2023年4月9日

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Arxiv

0+阅读 · 2023年4月8日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

相关基金

SMAD2调控ERK通路干预M2巨噬细胞活化在糖尿病肾病小鼠肾脏纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

neuritin抑制神经炎症改善糖尿病中枢神经病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

雷公藤多苷联合小檗碱预防和治疗2型糖尿病肾小管间质病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

腺嘌呤去甲基化酶FTO对2型糖尿病糖脂代谢关键基因的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

风暴潮灾害脆弱性测度及损失补偿对策研究

国家自然科学基金

0+阅读 · 2012年12月31日

弱光非线性光子学材料的缺陷设计与构筑

国家自然科学基金

0+阅读 · 2012年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员