An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation

With the increasing demand for computing capability given limited resource and power budgets, it is crucial to deploy applications to customized accelerators like FPGAs. However, FPGA programming is non-trivial. Although existing high-level synthesis (HLS) tools improve productivity to a certain extent, they are limited in scope and capability to support sufficient FPGA-oriented optimizations. This paper focuses on FPGA-based accelerators and proposes POM, an optimizing framework built on multi-level intermediate representation (MLIR). POM has several features which demonstrate its scope and capability of performance optimization. First, most HLS tools depend exclusively on a single-level IR to perform all the optimizations, introducing excessive information into the IR and making debugging an arduous task. In contrast, POM introduces three layers of IR to perform operations at suitable abstraction levels, streamlining the implementation and debugging process and exhibiting better flexibility, extensibility, and systematicness. Second, POM integrates the polyhedral model into MLIR, enabling advanced dependence analysis and various FPGA-oriented loop transformations. By representing nested loops with integer sets and maps, loop transformations can be conducted conveniently through manipulations on polyhedral semantics. Finally, to further relieve design effort, POM has a user-friendly programming interface (DSL) that allows a concise description of computation and includes a rich collection of scheduling primitives. An automatic design space exploration (DSE) engine is provided to search for high-performance optimization schemes efficiently and generate optimized accelerators automatically. Experimental results show that POM achieves a $6.46\times$ average speedup on typical benchmark suites and a $6.06\times$ average speedup on real-world applications compared to the state-of-the-art.

翻译：随着在有限资源与功耗预算下对计算能力需求的日益增长，将应用部署到如FPGA等定制化加速器上变得至关重要。然而，FPGA编程并非易事。尽管现有高级综合（HLS）工具在一定程度上提升了开发效率，但其在范围与能力上存在局限，难以充分支持面向FPGA的优化。本文聚焦于基于FPGA的加速器，提出POM——一种基于多级中间表示（MLIR）构建的优化框架。POM具备多项彰显其性能优化范围与能力的特性。首先，大多数HLS工具完全依赖单级IR执行所有优化，向IR中引入过多信息并使调试成为一项艰巨任务。相比之下，POM引入三级IR，在适当的抽象层次执行操作，简化了实现与调试流程，展现出更好的灵活性、可扩展性与系统性。其次，POM将多面体模型集成到MLIR中，实现了高级依赖分析及多种面向FPGA的循环变换。通过用整数集合与映射表示嵌套循环，可借助多面体语义上的操作便捷地进行循环变换。最后，为进一步减轻设计工作量，POM提供了用户友好的编程接口（DSL），可简洁描述计算过程，并包含丰富的调度原语集合。同时提供自动设计空间探索（DSE）引擎，高效搜索高性能优化方案并自动生成优化后的加速器。实验结果表明，相较于现有最先进方案，POM在典型基准测试套件上实现了$6.46\times$的平均加速比，在真实应用中实现了$6.06\times$的平均加速比。