High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.
翻译:高层综合、源到源编译器以及用于编译指示插入的各种设计空间探索技术已显著提升生成设计的质量结果。这些工具具有缩短开发时间和提升性能等优势。然而,要实现高质量结果,往往需要额外的手动代码转换和分块选择,而这些通常作为单独或预处理步骤执行。尽管DSE技术能够预先实现代码转换,但搜索空间过于庞大,常常限制对所有可能代码转换的探索,导致难以确定哪些转换是必要的。此外,保证正确性仍具挑战性,尤其对于复杂转换和优化。为解决这一障碍,我们首先提出一个利用HLS编译器的综合框架。我们的系统通过统一优化问题,将代码转换、编译指示插入以及用于片上数据缓存的分块大小选择流线化,旨在增强并行化,特别适用于计算密集型内核。通过采用新型非线性规划方法,我们同时确定转换、编译指示和分块大小,重点关注规则循环基内核。评估结果表明,我们的框架能够精准识别恰当的转换(包括无需转换的场景)并插入编译指示,从而获得优质的质量结果。