ALLaM: Large Language Models for Arabic and English

M Saiful Bari,Yazeed Alnumay,Norah A. Alzahrani,Nouf M. Alotaibi,Hisham A. Alyahya,Sultan AlRashed,Faisal A. Mirza,Shaykhah Z. Alsubaie,Hassan A. Alahmed,Ghadah Alabduljabbar,Raghad Alkhathran,Yousef Almushayqih,Raneem Alnajim,Salman Alsubaihi,Maryam Al Mansour,Majed Alrubaian,Ali Alammari,Zaki Alawami,Abdulmohsen Al-Thubaity,Ahmed Abdelali,Jeril Kuriakose,Abdalghani Abujabal,Nora Al-Twairesh,Areeb Alowisheq,Haidar Khan

We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT). ALLaM is carefully trained considering the values of language alignment and knowledge transfer at scale. Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining on a mixture of Arabic and English text can steer a model towards a new language (Arabic) without any catastrophic forgetting in the original language (English). Furthermore, we highlight the effectiveness of using parallel/translated data to aid the process of knowledge alignment between languages. Finally, we show that extensive alignment with human preferences can significantly enhance the performance of a language model compared to models of a larger scale with lower quality alignment. ALLaM achieves state-of-the-art performance in various Arabic benchmarks, including MMLU Arabic, ACVA, and Arabic Exams. Our aligned models improve both in Arabic and English from their base aligned models.

翻译：本文提出ALLaM：阿拉伯语大语言模型，这是一个旨在支持阿拉伯语语言技术（ALT）生态系统的大语言模型系列。ALLaM在训练过程中审慎考量了语言对齐与大规模知识迁移的价值。我们的自回归仅解码器架构模型证明了如何通过词汇扩展和阿拉伯语-英语混合文本的预训练实现第二语言习得，从而将模型导向新语言（阿拉伯语），同时避免对原始语言（英语）产生灾难性遗忘。此外，我们重点阐述了使用平行/翻译数据对促进语言间知识对齐过程的有效性。最后，我们表明，相较于对齐质量较低的大规模模型，与人类偏好进行广泛对齐能显著提升语言模型的性能。ALLaM在多项阿拉伯语基准测试中取得了最先进的性能，包括MMLU Arabic、ACVA和Arabic Exams。我们的对齐模型在阿拉伯语和英语任务上均较其基础对齐模型有所提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日