Large language models (LLMs) show promise for automated code optimization. However, without performance context, they struggle to produce correct and effective code transformations. Existing performance tools can identify bottlenecks but stop short of generating actionable code changes. Consequently, performance optimization continues to be a time-intensive and manual endeavor, typically undertaken only by experts with detailed architectural understanding. To bridge this gap, we introduce Optimas, a modular, fully automated, end-to-end generative AI framework built on a multi-agent workflow. Optimas uses LLMs to map performance diagnostics from multiple reports to established, literature-backed code transformations, while unifying insight extraction, code generation, execution, and validation within a single pipeline. Across 3,410 real-world experiments on 10 benchmarks and two HPC mini-applications, Optimas generates 100% correct code and improves performance in over 98.82% of those experiments, achieving average gains of 8.02%-79.09% on NVIDIA GPUs.
翻译:大型语言模型在自动化代码优化方面展现出潜力。然而,缺乏性能上下文时,它们难以生成正确且有效的代码转换。现有性能工具可识别瓶颈,但无法生成可执行代码变更。因此,性能优化仍是一项耗时的手动工作,通常仅由深谙架构细节的专家完成。为弥补这一差距,我们提出Optimas——一个基于多智能体工作流的模块化全自动端到端生成式AI框架。Optimas利用大型语言模型将来自多份报告的性能诊断映射到已有文献支持的代码转换,同时将洞察提取、代码生成、执行与验证统一至单一流水线。在10个基准测试与两个HPC微型应用上的3410组真实实验中,Optimas生成了100%正确的代码,并在超过98.82%的实验中提升了性能,在NVIDIA GPU上实现了8.02%-79.09%的平均加速。