ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Team GLM, :,Aohan Zeng,Bin Xu,Bowen Wang,Chenhui Zhang,Da Yin,Diego Rojas,Guanyu Feng,Hanlin Zhao,Hanyu Lai,Hao Yu,Hongning Wang,Jiadai Sun,Jiajie Zhang,Jiale Cheng,Jiayi Gui,Jie Tang,Jing Zhang,Juanzi Li,Lei Zhao,Lindong Wu,Lucen Zhong,Mingdao Liu,Minlie Huang,Peng Zhang,Qinkai Zheng,Rui Lu,Shuaiqi Duan,Shudan Zhang,Shulin Cao,Shuxun Yang,Weng Lam Tam,Wenyi Zhao,Xiao Liu,Xiao Xia,Xiaohan Zhang,Xiaotao Gu,Xin Lv,Xinghan Liu,Xinyi Liu,Xinyue Yang,Xixuan Song,Xunkai Zhang,Yifan An,Yifan Xu,Yilin Niu,Yuantao Yang,Yueyan Li,Yushi Bai,Yuxiao Dong,Zehan Qi,Zhaoyu Wang,Zhen Yang,Zhengxiao Du,Zhenyu Hou,Zihan Wang

We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) touse -- including web browser, Python interpreter, text-to-image model, and user-defined functions -- to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone. The open models can be accessed through https://github.com/THUDM and https://huggingface.co/THUDM.

翻译：本文介绍ChatGLM，这是一个我们持续开发的大型语言模型系列。本报告主要关注GLM-4语言系列，包括GLM-4、GLM-4-Air和GLM-4-9B。它们代表了我们从ChatGLM前三代模型中汲取所有经验教训后开发出的最强模型。截至目前，GLM-4模型已在以中英文为主的十万亿级token及24种语言的小规模语料上进行预训练，并主要针对中英文使用场景进行对齐。高质量的对齐通过多阶段后训练过程实现，包括监督微调和人类反馈学习。评估表明，GLM-4在以下方面表现突出：1）在MMLU、GSM8K、MATH、BBH、GPQA和HumanEval等通用指标上接近或超越GPT-4；2）在IFEval评测的指令遵循能力上接近GPT-4-Turbo；3）在长上下文任务中与GPT-4 Turbo（128K）和Claude 3表现相当；4）在AlignBench评测的中文对齐能力上超越GPT-4。GLM-4 All Tools模型进一步优化了用户意图理解能力，能自主决策调用时机与工具选择——包括网页浏览器、Python解释器、文生图模型及用户自定义函数——以高效完成复杂任务。在实际应用中，其在网页浏览获取在线信息、使用Python解释器解决数学问题等任务上达到甚至超越GPT-4 All Tools的水平。在研发过程中，我们开源了包括ChatGLM-6B（三代）、GLM-4-9B（128K、1M）、GLM-4V-9B、WebGLM和CodeGeeX在内的系列模型，仅2023年在Hugging Face平台的下载量即超千万次。开源模型可通过https://github.com/THUDM 和 https://huggingface.co/THUDM 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日