Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking. Code at https://github.com/sharc-lab/HLSFactory.
翻译:机器学习(ML)技术已被应用于高层次综合(HLS)流程中,用于结果质量(QoR)预测与设计空间探索(DSE)。然而,高质量HLS数据集的稀缺性以及构建此类数据集的复杂性带来了挑战。现有数据集在基准测试覆盖范围、设计空间枚举、供应商可扩展性方面存在局限性,或缺乏可复现、可扩展的数据集构建软件。许多工作也未能提供用户友好的新设计添加方式,限制了这类数据集的广泛采用。为应对这些挑战,我们提出HLSFactory——一个旨在促进高质量HLS设计数据集策管与生成的综合性框架。HLSFactory包含三大阶段:1) 设计空间扩展阶段,通过跨多个供应商工具的各类优化指令将单一HLS设计扩展为大型设计空间;2) 设计综合阶段,并行执行HLS与FPGA工具流程;3) 数据聚合阶段,提取标准化数据并封装为适用于机器学习的数据集。这种三阶段架构通过设计空间扩展确保了广泛的设计空间覆盖率,并支持多种供应商工具。用户可在各阶段贡献自有HLS设计与综合结果,并通过自定义前端和工具流程扩展框架本身。我们还整合了一组源自常见HLS基准测试的初始内置设计,以及策管的开源HLS设计。通过六项案例研究展示了框架的多样性与多功能性:I) 设计空间采样;II) 细粒度并行后端加速;III) 适配Intel HLS流程;IV) 添加新型辅助设计;V) 集成已发表HLS数据;VI) HLS工具版本回归基准测试。代码见 https://github.com/sharc-lab/HLSFactory。