Can AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.
翻译:AI能否自主设计计算机系统机制,达到与人类专家相当的创造力和推理水平?我们提出了Glia——一种面向网络系统设计的AI架构,该架构在受人类启发的多智能体工作流中使用大语言模型(LLMs)。每个智能体分别专注于推理、实验与分析,通过一个将抽象推理锚定于实证反馈的评估框架进行协作。与先前优化黑盒策略的机器学习的系统方法不同,Glia生成可解释的设计并展示其推理过程。当应用于面向LLM推理的分布式GPU集群时,它生成了用于请求路由、调度和自动扩展的新算法,这些算法在显著更短的时间内达到人类专家水平的表现,同时揭示了关于工作负载行为的新颖见解。我们的结果表明,将推理型LLMs与结构化实验相结合,AI能够为复杂系统问题产生兼具创造力与可理解性的设计方案。