With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present PolyRouter, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query's requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, PolyRouter improves query efficiency by up to 40%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.
翻译:随着大型语言模型(LLMs)在各领域的快速发展,大量新型LLMs不断涌现,各自具备领域特定的专业知识。这种激增凸显了对快速、高质量且经济高效的LLM查询响应方法的需求。然而,目前尚不存在能够有效平衡这一“三难困境”的单一LLM:部分模型性能强大但成本极高,另一些则快速廉价但质量欠佳。为应对这一挑战,我们提出了PolyRouter——一个非单一化的LLM查询系统,该系统将多种LLM专家无缝集成至统一查询接口,并根据查询需求动态地将输入查询路由至最高效的专家模型。通过大量实验验证,与独立专家模型相比,PolyRouter可将查询效率提升高达40%,实现高达30%的显著成本降低,同时维持或提升模型性能达10%。