Ensemble reasoning for the strengths of different LLM experts is critical to achieving consistent and satisfactory performance on diverse inputs across a wide range of tasks. However, existing LLM ensemble methods are either computationally intensive or incapable of leveraging complementary knowledge among LLM experts for various inputs. In this paper, we propose a Dynamic Ensemble Reasoning paradigm, called DER to integrate the strengths of multiple LLM experts conditioned on dynamic inputs. Specifically, we model the LLM ensemble reasoning problem as a Markov Decision Process (MDP), wherein an agent sequentially takes inputs to request knowledge from an LLM candidate and passes the output to a subsequent LLM candidate. Moreover, we devise a reward function to train a DER-Agent to dynamically select an optimal answering route given the input questions, aiming to achieve the highest performance with as few computational resources as possible. Last, to fully transfer the expert knowledge from the prior LLMs, we develop a Knowledge Transfer Prompt (KTP) that enables the subsequent LLM candidates to transfer complementary knowledge effectively. Experiments demonstrate that our method uses fewer computational resources to achieve better performance compared to state-of-the-art baselines.
翻译:集成不同大语言模型专家的优势对于在广泛任务中实现稳定且令人满意的性能至关重要。然而,现有的大语言模型集成方法要么计算成本高昂,要么无法针对多样化输入有效利用各专家模型间的互补知识。本文提出一种动态集成推理范式,称为DER,旨在根据动态输入条件整合多个大语言模型专家的优势。具体而言,我们将大语言模型集成推理问题建模为马尔可夫决策过程,其中智能体按顺序接收输入,向候选大语言模型请求知识,并将输出传递给后续候选模型。此外,我们设计了一种奖励函数来训练DER-Agent,使其能够根据输入问题动态选择最优回答路径,力求以尽可能少的计算资源实现最高性能。最后,为充分迁移已有大语言模型的专家知识,我们开发了知识迁移提示,使后续候选模型能够有效传递互补知识。实验表明,与现有先进基线方法相比,我们的方法能以更少的计算资源获得更优性能。