Directed acyclic graphs (DAGs) serve as crucial data representations in domains such as hardware synthesis and compiler/program optimization for computing systems. DAG generative models facilitate the creation of synthetic DAGs, which can be used for benchmarking computing systems while preserving intellectual property. However, generating realistic DAGs is challenging due to their inherent directional and logical dependencies. This paper introduces LayerDAG, an autoregressive diffusion model, to address these challenges. LayerDAG decouples the strong node dependencies into manageable units that can be processed sequentially. By interpreting the partial order of nodes as a sequence of bipartite graphs, LayerDAG leverages autoregressive generation to model directional dependencies and employs diffusion models to capture logical dependencies within each bipartite graph. Comparative analyses demonstrate that LayerDAG outperforms existing DAG generative models in both expressiveness and generalization, particularly for generating large-scale DAGs with up to 400 nodes-a critical scenario for system benchmarking. Extensive experiments on both synthetic and real-world flow graphs from various computing platforms show that LayerDAG generates valid DAGs with superior statistical properties and benchmarking performance. The synthetic DAGs generated by LayerDAG enhance the training of ML-based surrogate models, resulting in improved accuracy in predicting performance metrics of real-world DAGs across diverse computing platforms.
翻译:有向无环图(DAG)在硬件综合、计算系统编译器/程序优化等领域是至关重要的数据表示形式。DAG生成模型能够促进合成DAG的创建,这些合成DAG可用于对计算系统进行基准测试,同时保护知识产权。然而,由于DAG固有的方向性和逻辑依赖性,生成真实的DAG具有挑战性。本文提出LayerDAG,一种自回归扩散模型,以应对这些挑战。LayerDAG将强节点依赖解耦为可顺序处理的管理单元。通过将节点的偏序关系解释为一系列二分图,LayerDAG利用自回归生成来建模方向依赖,并采用扩散模型来捕捉每个二分图内部的逻辑依赖。对比分析表明,LayerDAG在表达能力和泛化能力上均优于现有DAG生成模型,特别是在生成节点数高达400的大规模DAG时——这是系统基准测试的关键场景。在合成图以及来自多种计算平台的真实流图上进行的广泛实验表明,LayerDAG生成的合法DAG具有优越的统计特性和基准测试性能。由LayerDAG生成的合成DAG增强了基于机器学习的代理模型的训练,从而提高了跨不同计算平台预测真实DAG性能指标的准确性。