Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking. Code at https://github.com/sharc-lab/HLSFactory.
翻译:机器学习(ML)技术已被应用于高层次综合(HLS)流程,用于实现结果质量(QoR)预测与设计空间探索(DSE)。然而,高质量HLS数据集的匮乏以及构建此类数据集的复杂性带来了诸多挑战。现有数据集在基准测试覆盖范围、设计空间枚举、供应商可扩展性方面存在局限,或者缺乏可复现、可扩展的软件用于数据集构建。许多工作也未能提供用户友好的方式来添加更多设计,限制了此类数据集的更广泛采用。针对这些挑战,我们提出了HLSFactory——一个旨在促进高质量HLS设计数据集管理与生成的综合框架。HLSFactory包含三个主要阶段:1)设计空间扩展阶段,利用跨多个供应商工具的优化指令将单一HLS设计扩展为大规模设计空间;2)设计综合阶段,并行执行跨设计的HLS与FPGA工具流程;3)数据聚合阶段,提取标准化数据并打包为适用于机器学习的数据集。这种三部分架构通过设计空间扩展确保了广泛的设计空间覆盖,并支持多个供应商工具。用户可在各阶段贡献自身HLS设计与综合结果,并通过自定义前端与工具流程扩展框架本身。我们还整合了一组来自常见HLS基准测试的初始内置设计,以及经整理的开放源代码HLS设计。通过六项案例研究展示了框架的多功能性与多用途性:I)设计空间采样;II)细粒度并行后端加速;III)针对Intel HLS流程;IV)添加新的辅助设计;V)整合已发表的HLS数据;VI)HLS工具版本回归基准测试。代码地址:https://github.com/sharc-lab/HLSFactory。