The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.
翻译:大型语言模型的可复现性与透明度对推进开放研究、确保结果可信度、以及探究数据和模型偏见及潜在风险至关重要。为此,我们发布了OpenELM——一款具有先进水平的开放语言模型。OpenELM采用逐层缩放策略,在Transformer模型各层内高效分配参数,从而提升精度。例如,在约十亿参数预算下,OpenELM相比OLMo实现了2.36%的准确率提升,且所需的预训练令牌数量减少2倍。不同于以往仅提供模型权重与推理代码、并在私有数据集上预训练的做法,本次发布包含在公开数据集上训练和评估语言模型的完整框架,涵盖训练日志、多个检查点及预训练配置。我们还发布了将模型转换为MLX库的代码,以便在Apple设备上进行推理和微调。这一全面发布旨在赋能并强化开放研究社区,为未来的开放研究工作铺平道路。我们的源代码、预训练模型权重及训练流程可在\url{https://github.com/apple/corenet}获取。此外,\model 模型也可在HuggingFace平台找到:\url{https://huggingface.co/apple/OpenELM}。