CCoE: A Compact LLM with Collaboration of Experts

In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong domain experts together to fuse into a big LLM, provides a collective way of utilizing the different domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA domain LLMs. We start from 5 experts in the domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.

翻译：在大语言模型（LLM）领域，LLM在自然语言理解和生成方面展现出显著能力。随着在不同领域应用LLM的需求日益增长，如何以较低训练成本高效训练并构建一个具备多领域专业知识的模型成为一个研究问题。我们提出CCoE架构，这是一个将多个强大的领域专家模型轻松耦合以融合成一个大语言模型的框架，提供了一种协同利用不同领域专家LLM的集体化方法。此外，训练大规模多专家协作LLM对训练资源要求极高。CCoE通过隔离其他专家并分别训练每个专家来规避此问题。CCoE的设计通过CoE（专家协作）层集成多个专家LLM。每个CoE层可包含一个或多个专家LLM。专家LLM具有不同的层数，并已针对不同领域任务进行了充分训练。每个专家经过微调后能够达到与SOTA领域LLM相当的结果。我们从代码、数学、法律、文本到SQL以及医学五个领域的专家开始实验。结果表明，我们的CCoE框架能够轻松高效地将原始基础模型在不同领域的性能提升近10%-20%，同时在训练和推理过程中使用更少的资源。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日