Can AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.
翻译:人工智能能否像人类专家一样,自主设计出具有创造性和推理能力的计算机系统机制?我们提出Glia,一种基于大型语言模型(LLM)的网络化系统设计人工智能架构,采用受人类启发的多智能体工作流程。每个智能体专注于推理、实验和分析,通过一个将抽象推理建立在经验反馈基础上的评估框架进行协作。与先前优化黑箱策略的机器学习系统方法不同,Glia生成可解释的设计并公开其推理过程。当应用于为LLM推理服务的分布式GPU集群时,它在请求路由、调度和自动缩放方面生成了新算法,其性能在显著更短的时间内达到人类专家水平,同时揭示了关于工作负载行为的新见解。我们的结果表明,将推理型LLM与结构化实验相结合,人工智能能够为复杂系统问题生成具有创造性和可理解性的设计方案。