An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic Synthesis

This paper introduces an adaptive logic synthesis dataset generation framework designed to enhance machine learning applications within the logic synthesis process. Unlike previous dataset generation flows that were tailored for specific tasks or lacked integrated machine learning capabilities, the proposed framework supports a comprehensive range of machine learning tasks by encapsulating the three fundamental steps of logic synthesis: Boolean representation, logic optimization, and technology mapping. It preserves the original information in the intermediate files that can be stored in both Verilog and Graphmal format. Verilog files enable semi-customizability, allowing researchers to add steps and incrementally refine the generated dataset. The framework also includes an adaptive circuit engine to facilitate the loading of GraphML files for final dataset packaging and sub-dataset extraction. The generated OpenLS-D dataset comprises 46 combinational designs from established benchmarks, totaling over 966,000 Boolean circuits, with each design containing 21,000 circuits generated from 1000 synthesis recipes, including 7000 Boolean networks, 7000 ASIC netlists, and 7000 FPGA netlists. Furthermore, OpenLS-D supports integrating newly desired data features, making it more versatile for new challenges. The utility of OpenLS-D is demonstrated through four distinct downstream tasks: circuit classification, circuit ranking, quality of results (QoR) prediction, and probability prediction. Each task highlights different internal steps of logic synthesis, with the datasets extracted and relabeled from the OpenLS-D dataset using the circuit engine. The experimental results confirm the dataset's diversity and extensive applicability. The source code and datasets are available at https://github.com/Logic-Factory/ACE/blob/master/OpenLS-D/readme.md.

翻译：本文提出了一种自适应逻辑综合数据集生成框架，旨在增强逻辑综合流程中的机器学习应用。与以往针对特定任务定制或缺乏集成机器学习能力的数据集生成流程不同，所提出的框架通过封装逻辑综合的三个基本步骤——布尔表示、逻辑优化和技术映射——来支持全面的机器学习任务。它保留了中间文件中的原始信息，这些文件可以以Verilog和GraphML格式存储。Verilog文件支持半定制化，允许研究人员添加步骤并逐步完善生成的数据集。该框架还包含一个自适应电路引擎，便于加载GraphML文件以进行最终的数据集打包和子数据集提取。生成的OpenLS-D数据集包含来自成熟基准测试的46个组合设计，总计超过966,000个布尔电路，每个设计包含由1000种综合方案生成的21,000个电路，其中包括7000个布尔网络、7000个ASIC网表和7000个FPGA网表。此外，OpenLS-D支持集成新需求的数据特征，使其能够更灵活地应对新的挑战。OpenLS-D的实用性通过四个不同的下游任务得到验证：电路分类、电路排序、结果质量预测和概率预测。每个任务突出了逻辑综合的不同内部步骤，相关数据集均使用电路引擎从OpenLS-D数据集中提取并重新标记。实验结果证实了该数据集的多样性和广泛适用性。源代码和数据集可在https://github.com/Logic-Factory/ACE/blob/master/OpenLS-D/readme.md获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日