Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Ahmet Üstün,Viraat Aryabumi,Zheng-Xin Yong,Wei-Yin Ko,Daniel D'souza,Gbemileke Onilude,Neel Bhandari,Shivalika Singh,Hui-Lee Ooi,Amr Kayid,Freddie Vargus,Phil Blunsom,Shayne Longpre,Niklas Muennighoff,Marzieh Fadaee,Julia Kreutzer,Sara Hooker

Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOMZ on the majority of tasks while covering double the number of languages. We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages -- including discriminative and generative tasks, human evaluation, and simulated win rates that cover both held-out tasks and in-distribution performance. Furthermore, we conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models. We open-source our instruction datasets and our model at https://hf.co/CohereForAI/aya-101

翻译：大型语言模型（LLMs）的最新突破主要集中在少数数据丰富的语言上。如何将突破性成果扩展到非主流语言之外？我们的工作推出了Aya，一个大规模多语言生成式语言模型，能够在101种语言中执行指令，其中超过50%被视为低资源语言。Aya在大多数任务上优于mT0和BLOOMZ，同时覆盖的语言数量翻倍。我们引入了广泛的新评估套件，将多语言评估的最新技术扩展到99种语言——包括判别式与生成式任务、人工评估以及模拟胜率，这些评估涵盖了保留任务和分布内性能。此外，我们对最优微调混合组成、数据修剪以及模型的毒性、偏见和安全性进行了详细研究。我们开源了指令数据集和模型，地址为https://hf.co/CohereForAI/aya-101

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日