This paper introduces Patched MOA (Mixture of Agents), an inference optimization technique that significantly enhances the performance of large language models (LLMs) across diverse software development tasks. We evaluate three inference optimization algorithms - Best of N, Mixture of Agents, and Monte Carlo Tree Search and demonstrate that Patched MOA can boost the performance of smaller models to surpass that of larger, more expensive models. Notably, our approach improves the gpt-4o-mini model's performance on the Arena-Hard-Auto benchmark by 15.52%, outperforming gpt-4-turbo at a fraction of the cost. We also apply Patched MOA to various software development workflows, showing consistent improvements in task completion rates. Our method is model-agnostic, transparent to end-users, and can be easily integrated into existing LLM pipelines. This work contributes to the growing field of LLM optimization, offering a cost-effective solution for enhancing model performance without the need for fine-tuning or larger models.
翻译:本文介绍了补丁式MOA(智能体混合)推理优化技术,该技术能显著提升大语言模型在多样化软件开发任务中的性能表现。我们评估了三种推理优化算法——N选一最优法、智能体混合法与蒙特卡洛树搜索法,并证明补丁式MOA可将较小模型的性能提升至超越更大、更昂贵模型的水平。值得注意的是,我们的方法使gpt-4o-mini模型在Arena-Hard-Auto基准测试中的性能提升了15.52%,以极低的成本超越了gpt-4-turbo的表现。我们还将补丁式MOA应用于多种软件开发工作流,在任务完成率方面均展现出持续改进效果。该方法具有模型无关性、对终端用户透明,并可轻松集成至现有的大语言模型流程中。本工作为大语言模型优化这一蓬勃发展的领域提供了新思路,为无需微调或更大模型即可提升模型性能提供了一种经济高效的解决方案。