Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignoring the valuable information in the internal representations. In this work, we propose a training-free ensemble framework DeePEn, fusing the informative probability distributions yielded by different LLMs at each decoding step. Unfortunately, the vocabulary discrepancy between heterogeneous LLMs directly makes averaging the distributions unfeasible due to the token misalignment. To address this challenge, DeePEn maps the probability distribution of each model from its own probability space to a universal relative space based on the relative representation theory, and performs aggregation. Next, we devise a search-based inverse transformation to transform the aggregated result back to the probability space of one of the ensembling LLMs (main model), in order to determine the next token. We conduct extensive experiments on ensembles of different number of LLMs, ensembles of LLMs with different architectures, and ensembles between the LLM and the specialist model. Experimental results show that (i) DeePEn achieves consistent improvements across six benchmarks covering subject examination, reasoning, and knowledge, (ii) a well-performing specialist model can benefit from a less effective LLM through distribution fusion, and (iii) DeePEn has complementary strengths with other ensemble methods such as voting.

翻译：大语言模型（LLM）在不同任务中展现出互补优势，这推动了LLM集成方法的研究。然而，现有工作主要集中于训练额外的奖励模型或融合模型来筛选或组合所有候选答案，这对未见数据分布的泛化能力提出了巨大挑战。此外，先前方法使用文本响应作为通信媒介，忽略了内部表征中的宝贵信息。本文提出一种无需训练的集成框架DeePEn，该框架在每个解码步骤融合不同LLM生成的信息量丰富的概率分布。然而，异构LLM之间的词汇表差异导致词元无法对齐，使得直接对概率分布进行平均操作不可行。为解决这一挑战，DeePEn基于相对表示理论将每个模型的概率分布从其自身的概率空间映射到统一的相对空间，并进行聚合。接着，我们设计了一种基于搜索的逆变换方法，将聚合结果转换回其中一个集成LLM（主模型）的概率空间，以确定下一个词元。我们进行了大量实验，包括不同数量LLM的集成、不同架构LLM的集成以及LLM与专业模型之间的集成。实验结果表明：（i）DeePEn在涵盖学科考试、推理和知识的六个基准测试中均取得持续改进；（ii）性能优异的专业模型可通过分布融合从效果较差的LLM中获益；（iii）DeePEn与投票等其他集成方法具有互补优势。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日