The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.
翻译:大型语言模型的可复现性与透明度对于推动开放研究、确保结果可信度、以及开展数据与模型偏见及潜在风险的研究至关重要。为此,我们发布了OpenELM——一款先进的开源语言模型。OpenELM采用逐层缩放策略,在Transformer模型的每一层中高效分配参数,从而提升准确率。例如,在约十亿参数预算下,OpenELM相比OLMo准确率提升2.36%,且所需的预训练令牌数减少至原来的1/2。与以往仅提供模型权重、推理代码及使用私有数据集预训练的做法不同,我们的发布包含在公开数据集上进行语言模型训练与评估的完整框架,涵盖训练日志、多个检查点及预训练配置。我们还发布了将模型转换为MLX库的代码,以支持在Apple设备上进行推理与微调。这一全面发布旨在赋能并强化开放研究社区,为未来的开放研究工作铺平道路。我们的源代码、预训练模型权重及训练方案可在\url{https://github.com/apple/corenet}获取。此外,该模型可在HuggingFace平台找到:\url{https://huggingface.co/apple/OpenELM}。