OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Sachin Mehta,Mohammad Hossein Sekhavat,Qingqing Cao,Maxwell Horton,Yanzi Jin,Chenfan Sun,Iman Mirzadeh,Mahyar Najibi,Dmitry Belenko,Peter Zatloukal,Mohammad Rastegari

from arxiv, Minor corrections

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.

翻译：大型语言模型的可复现性与透明度对推进开放研究、确保结果可信度、以及探究数据和模型偏见及潜在风险至关重要。为此，我们发布了OpenELM——一款具有先进水平的开放语言模型。OpenELM采用逐层缩放策略，在Transformer模型各层内高效分配参数，从而提升精度。例如，在约十亿参数预算下，OpenELM相比OLMo实现了2.36%的准确率提升，且所需的预训练令牌数量减少2倍。不同于以往仅提供模型权重与推理代码、并在私有数据集上预训练的做法，本次发布包含在公开数据集上训练和评估语言模型的完整框架，涵盖训练日志、多个检查点及预训练配置。我们还发布了将模型转换为MLX库的代码，以便在Apple设备上进行推理和微调。这一全面发布旨在赋能并强化开放研究社区，为未来的开放研究工作铺平道路。我们的源代码、预训练模型权重及训练流程可在\url{https://github.com/apple/corenet}获取。此外，\model 模型也可在HuggingFace平台找到：\url{https://huggingface.co/apple/OpenELM}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日