BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting - 专知论文

会员服务 ·

0

Performer · Prompt · MoDELS · Continuity · HTTPS ·

2023 年 5 月 25 日

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

翻译：BLOOM+1：为BLOOM添加语言支持以实现零样本提示

Zheng-Xin Yong,Hailey Schoelkopf,Niklas Muennighoff,Alham Fikri Aji,David Ifeoluwa Adelani,Khalid Almubarak,M Saiful Bari,Lintang Sutawika,Jungo Kasai,Ahmed Baruwa,Genta Indra Winata,Stella Biderman,Edward Raff,Dragomir Radev,Vassilina Nikoulina

The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at https://github.com/bigscience-workshop/multilingual-modeling.

翻译：BLOOM模型是一个大型公开多语言语言模型，但其预训练仅涵盖46种语言。为了在避免高昂成本的前提下将BLOOM的优势扩展到其他语言，有必要使其适应预训练阶段未见过的新语言。在本工作中，我们将现有语言适应策略应用于BLOOM，并在资源受限条件下对其在八种新语言上的零样本提示性能进行基准测试。研究发现，语言适应能有效提升新语言上的零样本性能。令人惊讶的是，在大模型上，基于适配器的微调比继续预训练更为有效。此外，我们观察到提示性能受语言特性（如书写系统）影响不显著，主要取决于语言适应数据的规模。我们还向BLOOMZ（一种经多任务微调、具备零样本指令遵循能力的BLOOM版本）中添加了新语言。研究发现，将新语言纳入多任务微调混合数据集是教会BLOOMZ新语言的最有效方法。我们得出结论：在充足训练数据支持下，语言适应能良好泛化至多样化语言。我们的代码开源于https://github.com/bigscience-workshop/multilingual-modeling。

0

相关内容

Performer

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Klotho在糖尿病中调控牙周膜成纤维细胞凋亡的机制

国家自然科学基金

0+阅读 · 2013年12月31日

随机广义方程相对于概率分布的稳定性分析及应用

国家自然科学基金

1+阅读 · 2012年12月31日

水库水沙联合优化调度目标函数研究

国家自然科学基金

0+阅读 · 2012年12月31日

citron kinase促进HIV-1病毒颗粒包装出芽机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

平流层-中间层准两年振荡（QBO）的观测与模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

Langmuir环流在上层海洋混合中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Arxiv

0+阅读 · 2023年7月13日

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

Arxiv

0+阅读 · 2023年7月13日

PolyLM: An Open Source Polyglot Large Language Model

Arxiv

0+阅读 · 2023年7月12日

MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning

Arxiv

0+阅读 · 2023年7月11日

An Overview of Catastrophic AI Risks

Arxiv

0+阅读 · 2023年7月11日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Arxiv

15+阅读 · 2022年11月29日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

VIP会员

文章信息

相关主题

最新内容

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

3+阅读 · 今天8:04

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

4+阅读 · 今天7:59

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

4+阅读 · 今天7:56

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

4+阅读 · 今天7:50

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

4+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

5+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

13+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

7+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

7+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

5+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

11+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

7+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

10+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

9+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

12+阅读 · 7月26日

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《异构人类团队的协作决策过程混合建模研究》

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

面向国防作战的最佳自主与蜂群无人机技术

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Arxiv

0+阅读 · 2023年7月13日

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

Arxiv

0+阅读 · 2023年7月13日

PolyLM: An Open Source Polyglot Large Language Model

Arxiv

0+阅读 · 2023年7月12日

MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning

Arxiv

0+阅读 · 2023年7月11日

An Overview of Catastrophic AI Risks

Arxiv

0+阅读 · 2023年7月11日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Arxiv

15+阅读 · 2022年11月29日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Klotho在糖尿病中调控牙周膜成纤维细胞凋亡的机制

国家自然科学基金

0+阅读 · 2013年12月31日

随机广义方程相对于概率分布的稳定性分析及应用

国家自然科学基金

1+阅读 · 2012年12月31日

水库水沙联合优化调度目标函数研究

国家自然科学基金

0+阅读 · 2012年12月31日

citron kinase促进HIV-1病毒颗粒包装出芽机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

平流层-中间层准两年振荡（QBO）的观测与模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

Langmuir环流在上层海洋混合中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员