Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.
翻译:人工智能能否自主设计出与人类专家的创造力和推理能力相媲美的计算机系统机制?我们提出了Glia,一种用于网络化系统设计的人工智能架构,它采用大型语言模型(LLMs)构建受人类启发的多智能体工作流。每个智能体专精于推理、实验与分析,通过一个评估框架进行协作,该框架将抽象推理建立在经验反馈的基础上。与先前优化黑盒策略的机器学习系统方法不同,Glia生成可解释的设计并公开其推理过程。当应用于LLM推理的分布式GPU集群时,它生成了用于请求路由、调度和自动伸缩的新算法,这些算法在显著更短的时间内达到人类专家水平,同时为工作负载行为提供了新颖的见解。我们的结果表明,通过将推理型LLMs与结构化实验相结合,人工智能能够为复杂系统问题产生创造性且易于理解的设计。