PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

Yiqun Liu,Yingsheng Wu,Ruqi Yang,Enrong Zheng,Honglei Qiu,Sijun He,Tai Liang,Jingjing Wu,Yuhan Zhou,Yiwei Zhang,Dongyan Chen,Weihan Yi,Xinqi Li,Siqi Bao

from arxiv, Code and data available at https://github.com/PaddlePaddle/PassNet

Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling on long-tail workloads -- our profiling shows that 43% of real-world subgraphs experience end-to-end slowdowns under default compilation. While LLMs offer a path toward automated optimization, existing efforts focus on standalone kernel generation. We argue that pass generation -- where LLMs author structured graph transformations that integrate directly into compiler pipelines -- is the more appropriate abstraction. We propose PassNet, the first large-scale ecosystem for LLM-based compiler pass generation, comprising: (1) PassNet-Dataset, over 18K unique computational graphs from 100K real-world models; and (2) PassBench, 200 curated long-tail fusible tasks (comprising 2,060 subgraphs in total) evaluated under the Error-aware Speedup Score (ES_t) -- a metric unifying correctness, stability, and performance -- with layered integrity defenses against systematic LLM exploitation. Experiments reveal that PassBench is both highly discriminative and genuinely unsaturated: the best frontier model trails TorchInductor by 37% in aggregate, yet on individual subgraphs LLMs achieve up to 3x speedup over the same compiler -- indicating that the bottleneck is consistency, not capability. Fine-tuning a small model on merely ~4K PassNet trajectories yields a 2.67x improvement approaching frontier-model performance, demonstrating substantial headroom and validating PassNet as live training infrastructure for advancing LLM-driven compiler optimization. All data, benchmarks, and tooling are publicly available.

翻译：现代张量编译器（如TorchInductor）在主流模型上可显著提速，但在长尾负载中面临系统性性能瓶颈——我们的性能分析表明，43%的真实世界子图在默认编译下出现端到端性能退化。尽管大语言模型为自动优化提供了新路径，现有工作聚焦于独立内核生成。我们认为，Pass生成（即由大语言模型编写可直接集成到编译器流程的结构化图变换）是更合适的抽象范式。为此提出PassNet——首个面向LLM驱动编译器Pass生成的大规模生态系统，包含：（1）PassNet-Dataset，包含来自10万个真实世界模型的超过1.8万张独特计算图；（2）PassBench，包含200个精心筛选的长尾可融合任务（涵盖2060个子图），采用错误感知加速得分（ES_t）进行评估——该指标统一衡量正确性、稳定性与性能——并配备分层完整性防御机制以防范系统性LLM攻击。实验表明，PassBench具有高度区分性与实质性未饱和特性：最强基础模型整体落后TorchInductor达37%，但在单个子图上，LLM相比同一编译器最高可实现3倍加速——这表明瓶颈在于一致性而非能力。在仅约4000条PassNet轨迹数据上微调小型模型，即可带来2.67倍的性能提升，逼近前沿模型水平，彰显出巨大提升空间，并验证了PassNet作为推动LLM驱动编译器优化的实时训练基础设施的有效性。所有数据、基准测试与工具均已开源。