With the rise of specialized hardware and new programming languages, code optimization has shifted its focus towards promoting data locality. Most production-grade compilers adopt a control-centric mindset - instruction-driven optimization augmented with scalar-based dataflow - whereas other approaches provide domain-specific and general purpose data movement minimization, which can miss important control-flow optimizations. As the two representations are not commutable, users must choose one over the other. In this paper, we explore how both control- and data-centric approaches can work in tandem via the Multi-Level Intermediate Representation (MLIR) framework. Through a combination of an MLIR dialect and specialized passes, we recover parametric, symbolic dataflow that can be optimized within the DaCe framework. We combine the two views into a single pipeline, called DCIR, showing that it is strictly more powerful than either view. On several benchmarks and a real-world application in C, we show that our proposed pipeline consistently outperforms MLIR and automatically uncovers new optimization opportunities with no additional effort.
翻译:随着专用硬件和新型编程语言的兴起,代码优化的重点已转向促进数据局部性。大多数生产级编译器采用控制中心化思维——以指令驱动优化为基础,辅以标量数据流分析;而其他方法则提供领域特定和通用的数据移动最小化策略,但可能遗漏重要的控制流优化。由于这两种表示不可交换,用户必须择其一。本文探讨如何通过多层中间表示(MLIR)框架实现控制中心化与数据中心化方法的协同工作。通过结合MLIR方言与专用优化通道,我们恢复了可在DaCe框架内优化的参数化符号数据流。我们将两种视图整合为单一流水线(称为DCIR),证明其严格优于任一独立视图。在多个基准测试及一个真实C语言应用中,我们表明所提出的流水线始终优于MLIR,并能自动发掘新的优化机会,无需额外投入。