We introduce Cortex, a prototype workflow-aware serving platform designed for agentic workloads. The core principle of Cortex is stage isolation: it provisions dedicated resource pools for each distinct stage of an agentic workflow. This simple yet powerful strategy mitigates inter-stage interference in compute and memory, leading to better KV cache utilization, higher throughput, and more predictable performance. By customizing resource allocation and scheduling within each distinct stage of agentic workflows, Cortex lays the groundwork for more advanced, agent-native serving paradigms, including malleable resource management, speculative execution of workflow branches, and a shared, multi-tiered cache for "agentic state."
翻译:本文介绍Cortex——一个专为智能体工作负载设计的、具备工作流感知能力的原型服务平台。Cortex的核心原理是阶段隔离:它为智能体工作流的每个独立阶段配置专属资源池。这种简洁而强大的策略有效缓解了计算与内存的跨阶段干扰,从而提升键值缓存利用率、提高系统吞吐量,并获得更可预测的性能表现。通过在智能体工作流的各独立阶段内定制资源分配与调度策略,Cortex为更先进的、面向智能体原生服务范式奠定了基础,包括可塑性资源管理、工作流分支的推测执行,以及面向“智能体状态”的共享多级缓存系统。