This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model utilizes a discrete diffusion process that progressively edits graphs with noise, through the process of adding or removing edges and changing the categories. A graph transformer network is trained to revert this process, simplifying the problem of distribution learning over graphs into a sequence of node and edge classification tasks. We further improve sample quality by introducing a Markovian noise model that preserves the marginal distribution of node and edge types during diffusion, and by incorporating auxiliary graph-theoretic features. A procedure for conditioning the generation on graph-level features is also proposed. DiGress achieves state-of-the-art performance on molecular and non-molecular datasets, with up to 3x validity improvement on a planar graph dataset. It is also the first model to scale to the large GuacaMol dataset containing 1.3M drug-like molecules without the use of molecule-specific representations.
翻译:本文介绍DiGress,一种用于生成具有离散节点和边属性的图的离散去噪扩散模型。该模型利用离散扩散过程,通过添加或移除边以及改变类别的方式,用噪声逐步编辑图。训练一个图Transformer网络来逆转这一过程,将图上分布学习问题简化为一系列节点与边分类任务。进一步地,我们通过引入一种在扩散过程中保持节点和边类型边际分布的马尔可夫噪声模型,并融入辅助图论特征,提升了样本质量。此外,还提出了一种基于图级特征控制生成过程的方案。DiGress在分子及非分子数据集上均取得了最先进的性能,在平面图数据集上有效性提升高达3倍。它也是首个无需分子特异性表示即可扩展至包含130万类药分子的大型GuacaMol数据集的模型。