stratum: A System Infrastructure for Massive Agent-Centric ML Workloads

Recent advances in large language models (LLMs) transform how machine learning (ML) pipelines are developed and evaluated. LLMs enable a new type of workload, agentic pipeline search, in which autonomous or semi-autonomous agents generate, validate, and optimize complete ML pipelines. These agents predominantly operate over popular Python ML libraries and exhibit highly exploratory behavior. This results in thousands of executions for data profiling, pipeline generation, and iterative refinement of pipeline stages. However, the existing Python-based ML ecosystem is built around libraries such as Pandas and scikit-learn, which are designed for human-centric, interactive, sequential workflows and remain constrained by Python's interpretive execution model, library-level isolation, and limited runtime support for executing large numbers of pipelines. Meanwhile, many high-performance ML systems proposed by the systems community either target narrow workload classes or require specialized programming models, which limits their integration with the Python ML ecosystem and makes them largely ill-suited for LLM-based agents. This growing mismatch exposes a fundamental systems challenge in supporting agentic pipeline search at scale. We therefore propose stratum, a unified system infrastructure that decouples pipeline execution from planning and reasoning during agentic pipeline search. Stratum integrates seamlessly with existing Python libraries, compiles batches of pipelines into optimized execution graphs, and efficiently executes them across heterogeneous backends, including a novel Rust-based runtime. We present stratum's architectural vision along with an early prototype, discuss key design decisions, and outline open challenges and research directions. Finally, preliminary experiments show that stratum can significantly speed up large-scale agentic pipeline search up to 16.6x.

翻译：近年来，大型语言模型（LLM）的进展正在改变机器学习（ML）流水线的开发与评估方式。LLM催生了一种新型工作负载——智能体驱动的流水线搜索，其中自主或半自主智能体能够生成、验证并优化完整的ML流水线。这些智能体主要基于流行的Python ML库运行，并表现出高度探索性行为。这导致为数据剖析、流水线生成以及流水线阶段迭代优化而需执行数千次运算。然而，现有的基于Python的ML生态系统围绕Pandas和scikit-learn等库构建，这些库专为以人为中心、交互式、顺序化的工作流设计，仍受限于Python的解释执行模型、库级别的隔离性以及对大规模流水线执行的运行时支持不足。与此同时，系统研究领域提出的许多高性能ML系统要么针对特定狭窄的工作负载类型，要么需要专用的编程模型，这限制了它们与Python ML生态系统的集成，使其难以适配基于LLM的智能体。这种日益凸显的不匹配现象揭示了在规模化支持智能体驱动的流水线搜索时面临的基础性系统挑战。为此，我们提出stratum——一个统一的系统基础设施，它在智能体驱动的流水线搜索过程中将流水线执行与规划推理解耦。Stratum能够与现有Python库无缝集成，将批量流水线编译为优化的执行图，并高效地在异构后端（包括一个新颖的基于Rust的运行时）上执行它们。我们阐述了stratum的架构愿景及早期原型，讨论了关键设计决策，并概述了开放挑战与研究方向。最后，初步实验表明，stratum能够将大规模智能体驱动的流水线搜索速度显著提升高达16.6倍。