Complex agentic AI systems, powered by a coordinated ensemble of Large Language Models (LLMs), tool and memory modules, have demonstrated remarkable capabilities on intricate, multi-turn tasks. However, this success is shadowed by prohibitive economic costs and severe latency, exposing a critical, yet underexplored, trade-off. We formalize this challenge as the \textbf{Agent System Trilemma}: the inherent tension among achieving state-of-the-art performance, minimizing monetary cost, and ensuring rapid task completion. To dismantle this trilemma, we introduce EvoRoute, a self-evolving model routing paradigm that transcends static, pre-defined model assignments. Leveraging an ever-expanding knowledge base of prior experience, EvoRoute dynamically selects Pareto-optimal LLM backbones at each step, balancing accuracy, efficiency, and resource use, while continually refining its own selection policy through environment feedback. Experiments on challenging agentic benchmarks such as GAIA and BrowseComp+ demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80\%$ and latency by over $70\%$.
翻译:由大型语言模型(LLMs)协同集成、结合工具与记忆模块构成的复杂智能体人工智能系统,已在处理复杂多轮任务中展现出卓越能力。然而,这种成功背后隐藏着高昂的经济成本和严重的延迟问题,揭示了一个关键但尚未被充分探索的权衡关系。我们将这一挑战形式化为**智能体系统三难困境**:即在实现最先进性能、最小化经济成本与确保快速任务完成之间存在的固有张力。为破解此困境,本文提出EvoRoute——一种超越静态预定义模型分配的自进化模型路由范式。该方法利用持续扩展的先验经验知识库,在每一步动态选择帕累托最优的LLM骨干网络,以平衡准确性、效率与资源消耗,同时通过环境反馈持续优化其自身的路由策略。在GAIA和BrowseComp+等具有挑战性的智能体基准测试上的实验表明,当EvoRoute集成至现有商用智能体系统时,不仅能维持甚至提升系统性能,还可将执行成本降低高达80%,延迟减少超过70%。