High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.
翻译:高层次综合、源到源编译器以及各种用于Pragma插入的设计空间探索技术显著提升了生成设计的质量结果。这些工具带来开发时间缩短与性能增强等优势。然而,要获得高质量结果通常需要额外的人工代码变换和分块选择,这些往往作为独立或预处理步骤执行。尽管设计空间探索技术能够提前进行代码变换,但搜索空间的规模通常限制了所有可能代码变换的探索,使得确定哪些变换是必要的变得困难。此外,确保正确性仍具挑战性,尤其对于复杂变换与优化。为攻克这一难题,我们首先提出一个利用高层次综合编译器的综合框架。该系统通过一个统一优化问题,将代码变换、Pragma插入和片上数据缓存的分块大小选择整合一体,旨在增强并行化能力,尤其适用于计算受限的内核。通过采用一种新颖的非线性规划方法,我们能够同时确定变换方案、Pragma指令与分块大小,重点关注规则循环型内核。评估表明,我们的框架能够精准识别恰当的变换(包括无需变换的场景),并插入Pragma以获得令人满意的质量结果。