Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. Although large language models can synthesize functionally correct CUDA code, achieving competitive performance requires systematic exploration and verification of optimization choices. We present OptiML, an end-to-end framework that maps either natural-language intent or input CUDA code to performance-optimized CUDA kernels by formulating kernel optimization as search under verification. OptiML consists of two decoupled stages. When the input is natural language, a Mixture-of-Thoughts generator (OptiML-G) acts as a proposal policy over kernel implementation strategies, producing an initial executable program. A search-based optimizer (OptiML-X) then refines either synthesized or user-provided kernels using Monte Carlo Tree Search over LLM-driven edits, guided by a hardware-aware reward derived from profiler feedback. Each candidate transformation is compiled, verified, and profiled with Nsight Compute, and evaluated by a composite objective that combines runtime with hardware bottleneck proxies and guardrails against regressions. We evaluate OptiML in both synthesis-and-optimize and optimization-only settings on a diverse suite of CUDA kernels. Results show that OptiML consistently discovers verified performance improvements over strong LLM baselines and produces interpretable optimization trajectories grounded in profiler evidence.
翻译:生成高性能CUDA内核仍然具有挑战性,因为需要在噪声显著且硬件反馈代价高昂的条件下,遍历底层变换的组合空间。尽管大语言模型能够合成功能正确的CUDA代码,但要实现具有竞争力的性能,仍需对优化选择进行系统性探索与验证。本文提出OptiML——一个端到端框架,通过将内核优化建模为验证约束下的搜索过程,将自然语言描述意图或输入的CUDA代码映射为性能优化的CUDA内核。OptiML包含两个解耦阶段:当输入为自然语言时,混合思维生成器(OptiML-G)作为内核实现策略的提议策略,生成初始可执行程序;随后基于搜索的优化器(OptiML-X)通过蒙特卡洛树搜索对LLM驱动的代码修改进行探索,在源自性能分析器反馈的硬件感知奖励指导下,对合成或用户提供的内核进行优化。每个候选变换均通过Nsight Compute进行编译、验证与分析,并依据结合运行时间、硬件瓶颈代理指标及防止性能回退约束的复合目标进行评估。我们在多样化CUDA内核测试集上,对OptiML在“合成-优化”与“纯优化”两种场景进行评估。结果表明,相较于强大的LLM基线方法,OptiML能持续发现经验证的性能提升,并生成基于性能分析证据的可解释优化轨迹。