In this paper, we propose an architecture to harness the collective knowledge of multiple trained LLMs to create a new state-of-the-art. At the core of this framework is a LLM-based orchestrator that is adept at picking the right underlying LLM experts for optimal task execution. Inspired by self-play in reinforcement learning, we created a loop of query generation, orchestration, and evaluation to generate training data for the orchestrator. Our evaluation focused on the MMLU benchmark, employing models with 7B, 13B, and 34B parameters available on Hugging Face. The results demonstrate new state-of-the-art open-source models: Our Leeroo orchestrator achieves performance on par with the Mixtral model while incurring only two-thirds of its cost. Moreover, increasing the allowed cost surpasses Mixtral's accuracy by over 5% at the same cost level, reaching an accuracy of 75.9%. Further enhancements were observed when integrating GPT4 into the underlying model pool. The Leeroo orchestrator nearly matches GPT4's performance at half the cost and even exceeds GPT4's results with a 25% cost reduction. These findings illustrate the potential of our architecture in creating state-of-the-art and cost-effective LLMs by optimizing the synergy between multiple LLMs to achieve superior performance outcomes.
翻译:在本文中,我们提出了一种架构,旨在利用多个已训练LLMs的集体知识,打造新的最优性能模型。该框架的核心是基于LLM的编排器,能够精准选择最适合的底层LLM专家以优化任务执行。受强化学习中自我博弈的启发,我们构建了包含查询生成、编排与评估的循环流程,为编排器生成训练数据。评估聚焦于MMLU基准,采用Hugging Face上参数规模分别为7B、13B和34B的模型。实验结果表明,我们实现了开源模型领域的新最优性能:Leeroo编排器达到了与Mixtral模型相当的性能,但成本仅为后者的三分之二。此外,在相同成本水平下,通过提高允许开销,其准确率超越Mixtral超过5%,达到75.9%。当将GPT4集成到底层模型池时,性能进一步提升:Leeroo编排器以一半的成本几乎匹配GPT4的性能,甚至以降低25%的成本超越了GPT4的结果。这些发现揭示了该架构的潜力——通过优化多个LLM间的协同效应,在实现卓越性能的同时构建成本高效的LLMs。