Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs.
翻译:大型语言模型(LLM)在广泛的任务中展现出令人印象深刻的能力,然而模型选择往往需要在性能与成本之间进行权衡。更强大的模型虽然效果显著,但费用更高;而能力较弱的模型则更具成本效益。为解决这一困境,我们提出了若干高效的路由模型,在推理过程中动态选择较强或较弱的LLM,旨在优化成本与响应质量之间的平衡。我们基于人类偏好数据与数据增强技术构建了这些路由器的训练框架,以提升其性能。在广泛认可的基准测试上的评估表明,我们的方法在保证响应质量的同时显著降低了成本——在某些情况下可减少超过两倍。值得注意的是,我们的路由模型还展现出显著的迁移学习能力,即使在测试阶段替换强模型与弱模型时仍能保持性能。这凸显了此类路由器为LLM部署提供兼具成本效益与高性能解决方案的潜力。