Protean Compiler: An Agile Framework to Drive Fine-grain Phase Ordering

The phase ordering problem has been a long-standing challenge since the late 1970s, yet it remains an open problem due to having a vast optimization space and an unbounded nature, making it an open-ended problem without a finite solution, one can limit the scope by reducing the number and the length of optimizations. Traditionally, such locally optimized decisions are made by hand-coded algorithms tuned for a small number of benchmarks, often requiring significant effort to be retuned when the benchmark suite changes. In the past 20 years, Machine Learning has been employed to construct performance models to improve the selection and ordering of compiler optimizations, however, the approaches are not baked into the compiler seamlessly and never materialized to be leveraged at a fine-grained scope of code segments. This paper presents Protean Compiler: An agile framework to enable LLVM with built-in phase-ordering capabilities at a fine-grained scope. The framework also comprises a complete library of more than 140 handcrafted static feature collection methods at varying scopes, and the experimental results showcase speedup gains of up to 4.1% on average and up to 15.7% on select Cbench applications wrt LLVM's O3 by just incurring a few extra seconds of build time on Cbench. Additionally, Protean compiler allows for an easy integration with third-party ML frameworks and other Large Language Models, and this two-step optimization shows a gain of 10.1% and 8.5% speedup wrt O3 on Cbench's Susan and Jpeg applications. Protean compiler is seamlessly integrated into LLVM and can be used as a new, enhanced, full-fledged compiler. We plan to release the project to the open-source community in the near future.

翻译：阶段排序问题自20世纪70年代末以来一直是一个长期存在的挑战，但由于其庞大的优化空间和无界性质，它仍然是一个开放性问题，是一个没有有限解的开放式问题。人们可以通过减少优化的数量和长度来限制其范围。传统上，此类局部优化决策是由针对少量基准测试调优的手写算法做出的，当基准测试套件发生变化时，通常需要付出大量努力进行重新调优。在过去的20年里，机器学习已被用于构建性能模型以改进编译器优化的选择和排序，然而，这些方法并未无缝地融入编译器，也从未在代码段的细粒度范围内得以实现和利用。本文提出了普适编译器：一种敏捷框架，使LLVM能够在细粒度范围内具备内置的阶段排序能力。该框架还包含一个完整的库，包含超过140种在不同范围内手工设计的静态特征收集方法。实验结果表明，在Cbench上仅增加几秒钟的构建时间，相对于LLVM的O3优化级别，平均可获得高达4.1%的加速，在选定的Cbench应用程序上最高可达15.7%。此外，普适编译器允许轻松集成第三方机器学习框架和其他大型语言模型，这种两步优化在Cbench的Susan和Jpeg应用程序上相对于O3分别显示出10.1%和8.5%的加速增益。普适编译器已无缝集成到LLVM中，可以作为一个全新的、增强的、功能齐全的编译器使用。我们计划在不久的将来将该项目发布给开源社区。