Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks (GCPN and GraphAF), on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN and GraphAF on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.
翻译:图生成是一项重大挑战,因为它需要根据给定的标签预测包含多个节点和边的完整图。该任务对许多现实应用至关重要,包括从头药物设计和分子设计。近年来,图生成领域涌现出若干成功方法。然而,这些方法存在两大显著缺陷:(1)这些方法中使用的底层图神经网络(GNN)架构往往未被充分探索;(2)这些方法通常仅在有限数量的指标上进行评估。为填补这一空白,我们通过将图生成模型的底层GNN替换为更具表达力的GNN,研究GNN在分子图生成任务情境下的表达力。具体而言,我们在两种不同的生成框架(GCPN和GraphAF)中,针对ZINC-250k数据集上的六个不同分子生成目标,分析了六种GNN的性能。通过大量实验,我们证明先进GNN确实能提升GCPN和GraphAF在分子生成任务上的表现,但GNN表达力并非构建优秀基于GNN的生成模型的必要条件。此外,我们表明配备先进GNN的GCPN和GraphAF在提出的分子生成目标(DRD2、Median1、Median2)上,能够超越其他17种非基于GNN的图生成方法(如变分自编码器和贝叶斯优化模型),而这些目标对于从头分子设计是重要的评估指标。