REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving

While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovation. Compiler optimizations have long driven substantial performance improvements, but existing compilers struggle with neural workloads due to the exponentially large and highly interdependent space of possible transformations. Although existing stochastic search techniques can be effective, they are often sample-inefficient and fail to leverage the structural context underlying compilation decisions. We set out to investigate the research question of whether reasoning with large language models (LLMs), without any retraining, can leverage the context-aware decision space of compiler optimizations to significantly improve sample efficiency. To that end, we introduce a novel compilation framework (dubbed Reasoning Compiler) that formulates optimization as a sequential, context-aware decision process guided by a large language model and structured Monte Carlo tree search (MCTS). The LLM acts as a proposal mechanism, suggesting hardware-informed transformations that reflect the current program state and accumulated performance feedback. MCTS incorporates the LLM-generated proposals to balance exploration and exploitation, facilitating structured, context-sensitive traversal of the expansive compiler optimization space. By achieving substantial speedups with markedly fewer samples than leading neural compilers, our approach demonstrates the potential of LLM-guided reasoning to transform the landscape of compiler optimization.

翻译：尽管模型服务已释放出前所未有的能力，但大规模模型服务的高昂成本仍然是广泛可访问性和快速创新的重要障碍。编译器优化长期以来推动了显著的性能提升，但由于可能的转换空间呈指数级增长且高度相互依赖，现有编译器难以应对神经网络工作负载。尽管现有的随机搜索技术可能有效，但它们通常样本效率低下，且未能利用编译决策背后的结构上下文。我们着手研究以下科学问题：在不进行任何重新训练的情况下，利用大语言模型进行推理是否能够利用编译器优化的上下文感知决策空间，从而显著提高样本效率。为此，我们提出了一种新颖的编译框架（称为推理编译器），该框架将优化形式化为一个由大语言模型和结构化蒙特卡洛树搜索引导的顺序、上下文感知决策过程。大语言模型充当提议机制，提出反映当前程序状态和累积性能反馈的硬件感知转换。蒙特卡洛树搜索结合大语言模型生成的提议，以平衡探索与利用，促进对庞大编译器优化空间的结构化、上下文敏感遍历。通过以明显少于领先神经编译器的样本数量实现显著的加速，我们的方法展示了大语言模型引导的推理在改变编译器优化格局方面的潜力。