Large language models(LLMs) are now used to power complex multi-turn agentic workflows. Existing systems run agentic inference by loosely assembling isolated components: an LLM inference engine (e.g., vLLM) and a tool orchestrator (e.g., Kubernetes). Although agentic workflows involve multiple LLM and tool requests, these systems schedule and allocate resources separately on a per-request basis, without end-to-end knowledge of the workflow. This leads to sub-optimal management of KV cache and tool execution environments. To address the challenges, we propose ThunderAgent, a fast, simple, and program-aware agentic inference system. We first abstract agentic workflows as LLM Programs, enabling a unified view of heterogeneous resources, including KV caches, system states, and external tool assets such as disk memory and network ports. Built upon this abstraction, ThunderAgent introduces a program-aware scheduler and a tool resource manager designed to maximize KV cache hit rates, mitigate memory imbalances, and enable asynchronous environment preparation. Evaluations across coding, routing, and scientific discovery agents demonstrate that ThunderAgent achieves 1.5-3.6x throughput improvements in serving, 1.8-3.9x in RL rollout, and up to 4.2x disk memory savings compared to state-of-the-art inference systems. To facilitate reproducibility and support future development, we open-source the system implementations of the whole ThunderAgent at: https://github.com/Agentic-Kinetics/ThunderAgent.
翻译:大语言模型(LLMs)目前被用于驱动复杂的多轮智能体工作流。现有系统通过松散地组合孤立组件来运行智能体推理:一个LLM推理引擎(例如vLLM)和一个工具编排器(例如Kubernetes)。尽管智能体工作流涉及多个LLM和工具请求,但这些系统以每个请求为基础单独进行调度和资源分配,缺乏对工作流的端到端认知。这导致对KV缓存和工具执行环境的管理不够优化。为解决这些挑战,我们提出了ThunderAgent,一个快速、简单且程序感知的智能体推理系统。我们首先将智能体工作流抽象为LLM程序,从而实现对异构资源的统一视图,包括KV缓存、系统状态以及外部工具资产(如磁盘内存和网络端口)。基于此抽象,ThunderAgent引入了一个程序感知调度器和一个工具资源管理器,旨在最大化KV缓存命中率、缓解内存不平衡并实现异步环境准备。在编码、路由和科学发现智能体上的评估表明,与最先进的推理系统相比,ThunderAgent在服务吞吐量上实现了1.5-3.6倍的提升,在RL rollout中实现了1.8-3.9倍的提升,并节省了高达4.2倍的磁盘内存。为促进可复现性并支持未来发展,我们在以下地址开源了ThunderAgent的完整系统实现:https://github.com/Agentic-Kinetics/ThunderAgent。