In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong domain experts together to fuse into a big LLM, provides a collective way of utilizing the different domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA domain LLMs. We start from 5 experts in the domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.
翻译:在大语言模型(LLM)领域,LLM在自然语言理解和生成方面展现出显著能力。随着在不同领域应用LLM的需求日益增长,如何以较低训练成本高效训练并构建一个具备多领域专业知识的模型成为一个研究问题。我们提出CCoE架构,这是一个将多个强大的领域专家模型轻松耦合以融合成一个大语言模型的框架,提供了一种协同利用不同领域专家LLM的集体化方法。此外,训练大规模多专家协作LLM对训练资源要求极高。CCoE通过隔离其他专家并分别训练每个专家来规避此问题。CCoE的设计通过CoE(专家协作)层集成多个专家LLM。每个CoE层可包含一个或多个专家LLM。专家LLM具有不同的层数,并已针对不同领域任务进行了充分训练。每个专家经过微调后能够达到与SOTA领域LLM相当的结果。我们从代码、数学、法律、文本到SQL以及医学五个领域的专家开始实验。结果表明,我们的CCoE框架能够轻松高效地将原始基础模型在不同领域的性能提升近10%-20%,同时在训练和推理过程中使用更少的资源。