Bilingual Adaptation of Monolingual Foundation Models

Gurpreet Gosal,Yishi Xu,Gokul Ramakrishnan,Rituraj Joshi,Avraham Sheinin, Zhiming, Chen,Biswajit Mishra,Natalia Vassilieva,Joel Hestness,Neha Sengupta,Sunil Kumar Sahu,Bokang Jia,Onkar Pandit,Satheesh Katipomu,Samta Kamboj,Samujjwal Ghosh,Rahul Pal,Parvez Mullah,Soundar Doraiswamy,Mohamed El Karim Chami,Preslav Nakov

We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpus. By continually pre-training on a mix of Arabic and English corpora, the model retains its proficiency in English while acquiring capabilities in Arabic. Our approach results in significant improvements in Arabic and slight enhancements in English, demonstrating cost-effective cross-lingual transfer. We perform ablations on embedding initialization techniques, data mix ratios, and learning rates and release a detailed training recipe. To demonstrate generalizability of this approach we also adapted Llama 3 8B to Arabic and Llama 2 13B to Hindi.

翻译：本文提出了一种高效方法，用于将单语大语言模型（LLM）适配至另一种语言，以解决灾难性遗忘与分词器限制等挑战。本研究聚焦于将Llama 2模型适配至阿拉伯语。我们的两阶段方法首先扩展词汇表并仅训练嵌入矩阵，随后在双语语料库上进行完整的模型持续预训练。通过对阿拉伯语和英语混合语料库进行持续预训练，该模型在保持英语能力的同时获得了阿拉伯语处理能力。实验结果表明，该方法在阿拉伯语任务上取得显著提升，在英语任务上亦有小幅改进，证明了跨语言迁移的成本效益。我们对嵌入初始化技术、数据混合比例和学习率进行了消融实验，并公布了详细的训练方案。为验证该方法的泛化能力，我们还将Llama 3 8B模型适配至阿拉伯语，并将Llama 2 13B模型适配至印地语。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日