OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Sachin Mehta,Mohammad Hossein Sekhavat,Qingqing Cao,Maxwell Horton,Yanzi Jin,Chenfan Sun,Iman Mirzadeh,Mahyar Najibi,Dmitry Belenko,Peter Zatloukal,Mohammad Rastegari

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.

翻译：大型语言模型的可复现性与透明度对于推动开放研究、确保结果可信度、以及开展数据与模型偏见及潜在风险的研究至关重要。为此，我们发布了OpenELM——一款先进的开源语言模型。OpenELM采用逐层缩放策略，在Transformer模型的每一层中高效分配参数，从而提升准确率。例如，在约十亿参数预算下，OpenELM相比OLMo准确率提升2.36%，且所需的预训练令牌数减少至原来的1/2。与以往仅提供模型权重、推理代码及使用私有数据集预训练的做法不同，我们的发布包含在公开数据集上进行语言模型训练与评估的完整框架，涵盖训练日志、多个检查点及预训练配置。我们还发布了将模型转换为MLX库的代码，以支持在Apple设备上进行推理与微调。这一全面发布旨在赋能并强化开放研究社区，为未来的开放研究工作铺平道路。我们的源代码、预训练模型权重及训练方案可在\url{https://github.com/apple/corenet}获取。此外，该模型可在HuggingFace平台找到：\url{https://huggingface.co/apple/OpenELM}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日