Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

John Dang,Shivalika Singh,Daniel D'souza,Arash Ahmadian,Alejandro Salamanca,Madeline Smith,Aidan Peppin,Sungjin Hong,Manoj Govindassamy,Terrence Zhao,Sandra Kublik,Meor Amer,Viraat Aryabumi,Jon Ander Campos,Yi-Chern Tan,Tom Kocmi,Florian Strub,Nathan Grinsztajn,Yannis Flet-Berliac,Acyr Locatelli,Hangyu Lin,Dwarak Talupuru,Bharat Venkitesh,David Cairuz,Bowen Yang,Tim Chung,Wei-Yin Ko,Sylvie Shang Shi,Amir Shukayev,Sammie Bae,Aleksandra Piktus,Roman Castagné,Felipe Cruz-Salinas,Eddie Kim,Lucas Crawhall-Stein,Adrien Morisot,Sudip Roy,Phil Blunsom,Ivan Zhang,Aidan Gomez,Nick Frosst,Marzieh Fadaee,Beyza Ermis,Ahmet Üstün,Sara Hooker

We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual preference training, and model merging, Aya Expanse sets a new state-of-the-art in multilingual performance. Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models in their respective parameter classes, including Gemma 2, Qwen 2.5, and Llama 3.1, achieving up to a 76.6% win-rate. Notably, Aya Expanse 32B outperforms Llama 3.1 70B, a model with twice as many parameters, achieving a 54.0% win-rate. In this short technical report, we present extended evaluation results for the Aya Expanse model family and release their open-weights, together with a new multilingual evaluation dataset m-ArenaHard.

翻译：我们介绍了Aya Expanse模型系列，这是一代全新的80亿和320亿参数多语言大语言模型，旨在应对开发高性能多语言模型这一关键挑战，使其能力达到或超越单语模型。通过整合Cohere For AI与Cohere多年来的研究成果，包括数据套利、多语言偏好训练和模型融合等方面的进展，Aya Expanse在多语言性能上树立了新的技术标杆。我们在翻译成23种语言的Arena-Hard-Auto数据集上的评估表明，Aya Expanse 8B和32B在各自参数量级的模型中超越了包括Gemma 2、Qwen 2.5和Llama 3.1在内的领先开源权重模型，最高胜率达到76.6%。值得注意的是，Aya Expanse 32B以仅一半的参数量，在胜率上超越了Llama 3.1 70B，达到54.0%。在这份简短的技术报告中，我们展示了Aya Expanse模型系列的扩展评估结果，并开源了其模型权重，同时发布了一个新的多语言评估数据集m-ArenaHard。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日