We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual preference training, and model merging, Aya Expanse sets a new state-of-the-art in multilingual performance. Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models in their respective parameter classes, including Gemma 2, Qwen 2.5, and Llama 3.1, achieving up to a 76.6% win-rate. Notably, Aya Expanse 32B outperforms Llama 3.1 70B, a model with twice as many parameters, achieving a 54.0% win-rate. In this short technical report, we present extended evaluation results for the Aya Expanse model family and release their open-weights, together with a new multilingual evaluation dataset m-ArenaHard.
翻译:我们介绍了Aya Expanse模型系列,这是一代全新的80亿和320亿参数多语言大语言模型,旨在应对开发高性能多语言模型这一关键挑战,使其能力达到或超越单语模型。通过整合Cohere For AI与Cohere多年来的研究成果,包括数据套利、多语言偏好训练和模型融合等方面的进展,Aya Expanse在多语言性能上树立了新的技术标杆。我们在翻译成23种语言的Arena-Hard-Auto数据集上的评估表明,Aya Expanse 8B和32B在各自参数量级的模型中超越了包括Gemma 2、Qwen 2.5和Llama 3.1在内的领先开源权重模型,最高胜率达到76.6%。值得注意的是,Aya Expanse 32B以仅一半的参数量,在胜率上超越了Llama 3.1 70B,达到54.0%。在这份简短的技术报告中,我们展示了Aya Expanse模型系列的扩展评估结果,并开源了其模型权重,同时发布了一个新的多语言评估数据集m-ArenaHard。