ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Team GLM, :,Aohan Zeng,Bin Xu,Bowen Wang,Chenhui Zhang,Da Yin,Dan Zhang,Diego Rojas,Guanyu Feng,Hanlin Zhao,Hanyu Lai,Hao Yu,Hongning Wang,Jiadai Sun,Jiajie Zhang,Jiale Cheng,Jiayi Gui,Jie Tang,Jing Zhang,Jingyu Sun,Juanzi Li,Lei Zhao,Lindong Wu,Lucen Zhong,Mingdao Liu,Minlie Huang,Peng Zhang,Qinkai Zheng,Rui Lu,Shuaiqi Duan,Shudan Zhang,Shulin Cao,Shuxun Yang,Weng Lam Tam,Wenyi Zhao,Xiao Liu,Xiao Xia,Xiaohan Zhang,Xiaotao Gu,Xin Lv,Xinghan Liu,Xinyi Liu,Xinyue Yang,Xixuan Song,Xunkai Zhang,Yifan An,Yifan Xu,Yilin Niu,Yuantao Yang,Yueyan Li,Yushi Bai,Yuxiao Dong,Zehan Qi,Zhaoyu Wang,Zhen Yang,Zhengxiao Du,Zhenyu Hou,Zihan Wang

We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) touse -- including web browser, Python interpreter, text-to-image model, and user-defined functions -- to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone. The open models can be accessed through https://github.com/THUDM and https://huggingface.co/THUDM.

翻译：本文介绍了我们持续开发的大语言模型系列ChatGLM。本报告主要聚焦于GLM-4语言模型系列，包括GLM-4、GLM-4-Air与GLM-4-9B。这些模型凝聚了前三代ChatGLM研发的全部经验与洞见，是我们目前能力最强的模型。截至目前，GLM-4模型已在以中英文为主、涵盖24种语言的数万亿token语料上进行预训练，并主要针对中英文场景进行对齐优化。高质量的对齐通过包含监督微调与人类反馈学习的多阶段后训练流程实现。评估表明，GLM-4在以下方面表现突出：1）在MMLU、GSM8K、MATH、BBH、GPQA、HumanEval等通用评测指标上接近或超越GPT-4；2）在指令遵循能力（IFEval评测）上接近GPT-4-Turbo；3）在长上下文任务中与GPT-4 Turbo（128K）及Claude 3表现相当；4）在中文对齐评测（AlignBench）中优于GPT-4。GLM-4 All Tools模型进一步强化了用户意图理解与自主工具调用能力，能够根据复杂任务需求自主决策调用时机与工具类型——包括网页浏览器、Python解释器、文生图模型及用户自定义函数。在实际应用中，其在网页信息检索与Python数学问题求解等任务上达到甚至超越了GPT-4 All Tools的水平。研发过程中，我们开源了包括三代ChatGLM-6B、GLM-4-9B（128K, 1M）、GLM-4V-9B、WebGLM、CodeGeeX在内的系列模型，仅2023年在Hugging Face平台的下载量即超千万次。开源模型可通过https://github.com/THUDM 与 https://huggingface.co/THUDM 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日