Qwen2 Technical Report - 专知论文

An Yang,Baosong Yang,Binyuan Hui,Bo Zheng,Bowen Yu,Chang Zhou,Chengpeng Li,Chengyuan Li,Dayiheng Liu,Fei Huang,Guanting Dong,Haoran Wei,Huan Lin,Jialong Tang,Jialin Wang,Jian Yang,Jianhong Tu,Jianwei Zhang,Jianxin Ma,Jin Xu,Jingren Zhou,Jinze Bai,Jinzheng He,Junyang Lin,Kai Dang,Keming Lu,Keqin Chen,Kexin Yang,Mei Li,Mingfeng Xue,Na Ni,Pei Zhang,Peng Wang,Ru Peng,Rui Men,Ruize Gao,Runji Lin,Shijie Wang,Shuai Bai,Sinan Tan,Tianhang Zhu,Tianhao Li,Tianyu Liu,Wenbin Ge,Xiaodong Deng,Xiaohuan Zhou,Xingzhang Ren,Xinyu Zhang,Xipin Wei,Xuancheng Ren,Yang Fan,Yang Yao,Yichang Zhang,Yu Wan,Yunfei Chu,Zeyu Cui,Zhenru Zhang,Zhihao Fan

from arxiv, 25 pages, 1 figure

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face1 and ModelScope2, and the supplementary materials including example code on GitHub3. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors.

翻译：本报告介绍了Qwen2系列，这是我们大型语言模型和大型多模态模型家族的最新成员。我们发布了一套全面的基础模型和指令调优语言模型，参数规模覆盖0.5亿至720亿，包含稠密模型和专家混合模型。Qwen2在语言理解、生成、多语言能力、编程、数学和推理等多样化基准测试中，超越了包括其前代Qwen1.5在内的大多数先前的开源权重模型，并展现出与专有模型相竞争的性能。旗舰模型Qwen2-72B作为基础语言模型，表现出卓越的性能：在MMLU上达到84.2分，GPQA上37.9分，HumanEval上64.6分，GSM8K上89.5分，BBH上82.4分。其指令调优变体Qwen2-72B-Instruct在MT-Bench上获得9.1分，Arena-Hard上48.1分，LiveCodeBench上35.7分。此外，Qwen2展现出强大的多语言能力，精通约30种语言，涵盖英语、中文、西班牙语、法语、德语、阿拉伯语、俄语、韩语、日语、泰语、越南语等，突显了其多功能性和全球适用性。为促进社区创新和可及性，我们已在Hugging Face¹和ModelScope²上公开提供Qwen2模型权重，并在GitHub³上提供了包括示例代码在内的补充材料。这些平台还包含了量化、微调和部署的资源，以支持广泛的应用和研究工作。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日