High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.
翻译:高层次综合、源到源编译器以及用于编译指导插入的各种设计空间探索技术显著提升了生成设计的质量结果。这些工具具有缩短开发时间与提升性能等优势。然而,要获得高质量结果通常仍需额外的手动代码转换与分块选择,这些操作通常独立进行或作为预处理步骤执行。尽管设计空间探索技术能够预先实现代码转换,但搜索空间的广阔性往往限制了所有可能代码转换的探索,使得确定必要的转换方案颇具挑战。此外,确保转换正确性仍然存在困难,特别是对于复杂的转换与优化操作。为攻克此难题,我们首先提出一个利用高层次综合编译器的综合框架。本系统通过统一的优化问题,将代码转换、编译指导插入及面向片上数据缓存的块尺寸选择流程系统化,旨在增强并行化能力,尤其适用于计算密集型核心。随后采用创新的非线性规划方法,我们同步确定转换方案、编译指导与分块尺寸,重点关注基于规则循环的核心。评估结果表明,本框架能精准识别适宜的转换方案(包括无需转换的情形),并通过插入编译指导实现优异的质量结果。