SysLLMatic: Large Language Models are Software System Optimizers

Huiyun Peng,Arjun Gupte,Ryan Hasler,Nicholas John Eliopoulos,Chien-Chou Ho,Rishi Mantri,Leo Deng,Konstantin Läufer,George K. Thiruvathukal,James C. Davis

Automatic software system optimization can improve software speed, reduce operating costs, and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to generalize across diverse codebases and system contexts. Recent methods using Large Language Models (LLMs) introduce automation on simple programs, but they do not scale effectively to the complexity and size of real-world software systems. We present SysLLMatic, a system that integrates LLMs with performance diagnostics and a curated catalog of 43 optimization patterns to automatically optimize software systems. By leveraging profiling to identify performance hotspots, our approach enables LLMs to optimize real-world software beyond isolated code snippets. We evaluate it on three benchmark suites: HumanEval_CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapo (large-scale software systems in Java). Results show that SysLLMatic can improve software system performance, including latency, throughput, energy efficiency, memory usage, and CPU utilization. It consistently outperforms state-of-the-art LLM baselines on microbenchmarks. On large-scale application codes, to which prior LLM approaches have not scaled, it surpasses compiler optimizations, achieving average relative improvements of 1.54x in latency (vs. 1.01x for the compiler) and 1.24x in energy (vs. 1.08x for the compiler). Our findings demonstrate that LLMs, guided by performance knowledge through the optimization pattern catalog and appropriate performance diagnostics, can serve as viable software system optimizers. We further identify limitations of our approach and the challenges involved in handling complex applications. This work provides a foundation for generating optimized code across various languages, benchmarks, and program sizes in a principled manner.

翻译：自动软件系统优化能够提升软件运行速度、降低运营成本并节省能源。传统优化方法依赖于人工调优和编译器启发式策略，限制了其在多样化代码库和系统环境中的泛化能力。近期基于大型语言模型（LLM）的方法虽为简单程序引入了自动化优化，但无法有效扩展到真实软件系统的复杂性与规模。我们提出SysLLMatic系统，该系统将LLM与性能诊断工具及包含43种优化模式的精选目录相结合，实现软件系统的自动优化。通过利用性能剖析识别热点区域，我们的方法使LLM能够突破孤立代码片段的限制，对真实软件进行优化。我们在三个基准测试集上进行了评估：HumanEval_CPP（C++竞赛编程）、SciMark2（Java科学计算核）和DaCapo（Java大规模软件系统）。结果表明，SysLLMatic可提升软件系统在延迟、吞吐量、能效、内存占用和CPU利用率等方面的性能。在微基准测试中，它持续优于现有最优LLM基线方法。对于先前LLM方法无法扩展的大规模应用代码，SysLLMatic超越编译器优化，在延迟方面实现平均1.54倍的相对提升（编译器为1.01倍），在能耗方面实现1.24倍的相对提升（编译器为1.08倍）。我们的研究证明，通过优化模式目录和适当性能诊断工具引导的性能知识，LLM可作为可行的软件系统优化器。此外，我们进一步指出了当前方法的局限性以及处理复杂应用时面临的挑战。本工作为跨语言、跨基准测试、跨程序规模生成原则性优化的代码奠定了基础。