We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.
翻译:我们介绍了 H2O-Danube3 系列小型语言模型,包括在 6T 词元上训练的 H2O-Danube3-4B 和在 4T 词元上训练的 H2O-Danube3-500M。我们的模型在高质量网络数据(主要由英语词元构成)上进行了三阶段预训练,每个阶段使用不同的数据混合比例,之后为聊天版本进行了最终的监督微调。该系列模型在众多学术、聊天和微调基准测试中展现出极具竞争力的性能指标。得益于其紧凑的架构,H2O-Danube3 可以在现代智能手机上高效运行,即使在移动设备上也能实现本地推理和快速处理能力。我们依据 Apache 2.0 许可证开放所有模型,旨在进一步从经济层面推动大型语言模型向更广泛的受众普及。