CCoE: A Compact LLM with Collaboration of Experts

In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong domain experts together to fuse into a big LLM, provides a collective way of utilizing the different domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA domain LLMs. We start from 5 experts in the domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.

翻译：在大语言模型（LLM）领域，LLM在自然语言理解与生成方面展现出显著能力。随着将LLM应用于不同领域的需求日益增长，如何以较低训练成本高效训练并构建一个具备多领域专业知识的模型，成为一个重要的研究问题。本文提出CCoE架构，该框架能够轻松耦合多个强大的领域专家模型，融合为一个大型LLM，提供了一种协同利用不同领域专家LLM的集体化方法。此外，训练大规模多专家协作LLM对训练资源要求极高。CCoE通过隔离其他专家并分别训练每个专家，有效规避了这一问题。CCoE的设计通过CoE（专家协作）层集成多个专家LLM。每个CoE层可包含一个或多个专家LLM。各专家LLM具有不同的层数，并已针对不同领域任务进行了充分训练。每个专家经过微调后，均能达到与当前领域SOTA模型相当的性能。我们以代码、数学、法律、文本转SQL和医疗五个领域的专家为起点进行实验。结果表明，CCoE框架能够轻松高效地在不同领域将原始基础模型的性能提升约10%-20%，同时在训练和推理阶段均消耗更少资源。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日