High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.
翻译:高层次综合、源到源编译器以及面向Pragma插入的各种设计空间探索技术显著提升了生成设计的结果质量。这些工具具有缩短开发周期和提升性能等优势,但获得高质量结果通常需要额外的手动代码变换与分块参数选择,而这些操作往往作为独立或预处理步骤执行。虽然DSE技术能够实现代码变换的自动化,但巨大的搜索空间通常限制了对所有可能代码变换的探索,使得确定必要变换变得困难。此外,对于复杂变换与优化的正确性验证仍具挑战性。为解决这一难题,我们首先提出一个基于HLS编译器的综合框架。该系统通过统一优化问题,将代码变换、Pragma插入与片上数据缓存分块参数选择流程化,旨在提升并行化效率,尤其适用于计算密集型内核。通过采用新型非线性规划方法,我们同时确定变换策略、Pragma参数与分块尺寸,重点关注规则循环结构内核。实验评估表明,本框架能够准确识别合适的变换方案(包括无需变换的情形)并插入Pragma,从而获得理想的结果质量。