We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the marginal likelihood of a training set under our latent variable model, the base LLM automatically learns when to generate itself and when to call on one of the ``assistant'' language models to generate, all without direct supervision. Token-level collaboration during decoding allows for a fusion of each model's expertise in a manner tailored to the specific task at hand. Our collaborative decoding is especially useful in cross-domain settings where a generalist base LLM learns to invoke domain expert models. On instruction-following, domain-specific QA, and reasoning tasks, we show that the performance of the joint system exceeds that of the individual models. Through qualitative analysis of the learned latent decisions, we show models trained with our method exhibit several interesting collaboration patterns, e.g., template-filling. Our code is available at https://github.com/clinicalml/co-llm.
翻译:我们提出了一种方法,通过在多语言模型之间交替生成token来训练多个大语言模型(LLM)进行协同工作。我们将“由哪个LLM生成下一个token”的决策建模为隐变量。通过优化训练集在隐变量模型下的边际似然,基础LLM能够自动学习何时自行生成、何时调用“助手”语言模型生成,整个过程无需直接监督。解码过程中的token级协作使得各模型能够根据具体任务需求融合其专业能力。我们的协同解码方法在跨领域场景中尤其有效,通用基础LLM可学会调用领域专家模型。在指令跟随、领域特定问答和推理任务中,联合系统的性能均超越单个模型。通过对学习到的隐变量决策进行定性分析,我们发现经本方法训练的模型展现出多种有趣的协作模式,例如模板填充。代码已开源:https://github.com/clinicalml/co-llm。