We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge. GLM-Dialog offers a series of applicable techniques for exploiting various external knowledge including both helpful and noisy knowledge, enabling the creation of robust knowledge-grounded dialogue LLMs with limited proper datasets. To evaluate the GLM-Dialog more fairly, we also propose a novel evaluation method to allow humans to converse with multiple deployed bots simultaneously and compare their performance implicitly instead of explicitly rating using multidimensional metrics.Comprehensive evaluations from automatic to human perspective demonstrate the advantages of GLM-Dialog comparing with existing open source Chinese dialogue models. We release both the model checkpoint and source code, and also deploy it as a WeChat application to interact with users. We offer our evaluation platform online in an effort to prompt the development of open source models and reliable dialogue evaluation systems. The additional easy-to-use toolkit that consists of short text entity linking, query generation, and helpful knowledge classification is also released to enable diverse applications. All the source code is available on Github.
翻译:我们提出了GLM-Dialog,一个拥有100亿参数的大规模语言模型(LLM),能够利用搜索引擎访问互联网知识,实现中文知识驱动对话。GLM-Dialog提供了一系列适用于多种外部知识(包括有益及噪声知识)的实用技术,使得在有限高质量数据集条件下能够构建鲁棒的知识驱动对话大语言模型。为更公正地评估GLM-Dialog,我们还提出了一种新型评估方法:允许人类与多个部署的对话机器人同时进行隐式交互比较,而非采用多维指标进行显式评分。从自动评估到人工评估的全面评测结果表明,GLM-Dialog相较于现有开源中文对话模型具有显著优势。我们公开了模型检查点与源代码,并将其部署为微信小程序供用户交互。同时在线开放评估平台,旨在促进开源模型与可靠对话评估系统的发展。此外,我们还发布了包含短文本实体链接、查询生成及有益知识分类的易用工具包,以支持多样化应用。所有源代码均已在Github上开源。