In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong domain experts together to fuse into a big LLM, provides a collective way of utilizing the different domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA domain LLMs. We start from 5 experts in the domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.
翻译:在大语言模型(LLM)领域,LLM在自然语言理解与生成方面展现出显著能力。随着将LLM应用于不同领域的需求日益增长,如何以较低训练成本高效训练并构建一个具备多领域专业知识的模型,成为一个重要的研究问题。本文提出CCoE架构,该框架能够轻松耦合多个强大的领域专家模型,融合为一个大型LLM,提供了一种协同利用不同领域专家LLM的集体化方法。此外,训练大规模多专家协作LLM对训练资源要求极高。CCoE通过隔离其他专家并分别训练每个专家,有效规避了这一问题。CCoE的设计通过CoE(专家协作)层集成多个专家LLM。每个CoE层可包含一个或多个专家LLM。各专家LLM具有不同的层数,并已针对不同领域任务进行了充分训练。每个专家经过微调后,均能达到与当前领域SOTA模型相当的性能。我们以代码、数学、法律、文本转SQL和医疗五个领域的专家为起点进行实验。结果表明,CCoE框架能够轻松高效地在不同领域将原始基础模型的性能提升约10%-20%,同时在训练和推理阶段均消耗更少资源。