This paper introduces an adaptive logic synthesis dataset generation framework designed to enhance machine learning applications within the logic synthesis process. Unlike previous dataset generation flows that were tailored for specific tasks or lacked integrated machine learning capabilities, the proposed framework supports a comprehensive range of machine learning tasks by encapsulating the three fundamental steps of logic synthesis: Boolean representation, logic optimization, and technology mapping. It preserves the original information in the intermediate files that can be stored in both Verilog and Graphmal format. Verilog files enable semi-customizability, allowing researchers to add steps and incrementally refine the generated dataset. The framework also includes an adaptive circuit engine to facilitate the loading of GraphML files for final dataset packaging and sub-dataset extraction. The generated OpenLS-D dataset comprises 46 combinational designs from established benchmarks, totaling over 966,000 Boolean circuits, with each design containing 21,000 circuits generated from 1000 synthesis recipes, including 7000 Boolean networks, 7000 ASIC netlists, and 7000 FPGA netlists. Furthermore, OpenLS-D supports integrating newly desired data features, making it more versatile for new challenges. The utility of OpenLS-D is demonstrated through four distinct downstream tasks: circuit classification, circuit ranking, quality of results (QoR) prediction, and probability prediction. Each task highlights different internal steps of logic synthesis, with the datasets extracted and relabeled from the OpenLS-D dataset using the circuit engine. The experimental results confirm the dataset's diversity and extensive applicability. The source code and datasets are available at https://github.com/Logic-Factory/ACE/blob/master/OpenLS-D/readme.md.
翻译:本文提出了一种自适应逻辑综合数据集生成框架,旨在增强逻辑综合流程中的机器学习应用。与以往针对特定任务定制或缺乏集成机器学习能力的数据集生成流程不同,所提出的框架通过封装逻辑综合的三个基本步骤——布尔表示、逻辑优化和技术映射——来支持全面的机器学习任务。它保留了中间文件中的原始信息,这些文件可以以Verilog和GraphML格式存储。Verilog文件支持半定制化,允许研究人员添加步骤并逐步完善生成的数据集。该框架还包含一个自适应电路引擎,便于加载GraphML文件以进行最终的数据集打包和子数据集提取。生成的OpenLS-D数据集包含来自成熟基准测试的46个组合设计,总计超过966,000个布尔电路,每个设计包含由1000种综合方案生成的21,000个电路,其中包括7000个布尔网络、7000个ASIC网表和7000个FPGA网表。此外,OpenLS-D支持集成新需求的数据特征,使其能够更灵活地应对新的挑战。OpenLS-D的实用性通过四个不同的下游任务得到验证:电路分类、电路排序、结果质量预测和概率预测。每个任务突出了逻辑综合的不同内部步骤,相关数据集均使用电路引擎从OpenLS-D数据集中提取并重新标记。实验结果证实了该数据集的多样性和广泛适用性。源代码和数据集可在https://github.com/Logic-Factory/ACE/blob/master/OpenLS-D/readme.md获取。