Recently, diffusion models have achieved remarkable performance in data generation, e.g., generating high-quality images. Nevertheless, chemistry molecules often have complex non-Euclidean spatial structures, with the behavior changing dynamically and unpredictably. Most existing diffusion models highly rely on computing the probability distribution, i.e., Gaussian distribution, in Euclidean space, which cannot capture internal non-Euclidean structures of molecules, especially the hierarchical structures of the implicit manifold surface represented by molecules. It has been observed that the complex hierarchical structures in hyperbolic embedding space become more prominent and easier to be captured. In order to leverage both the data generation power of diffusion models and the strong capability to extract complex geometric features of hyperbolic embedding, we propose to extend the diffusion model to hyperbolic manifolds for molecule generation, namely, Hyperbolic Graph Diffusion Model (HGDM). The proposed HGDM employs a hyperbolic variational autoencoder to generate the hyperbolic hidden representation of nodes and then a score-based hyperbolic graph neural network is used to learn the distribution in hyperbolic space. Numerical experimental results show that the proposed HGDM achieves higher performance on several molecular datasets, compared with state-of-the-art methods.
翻译:最近,扩散模型在数据生成(例如生成高质量图像)方面取得了显著性能。然而,化学分子通常具有复杂的非欧几里得空间结构,其行为动态且不可预测。大多数现有扩散模型高度依赖于在欧几里得空间中计算概率分布(即高斯分布),这无法捕捉分子内部的非欧几里得结构,尤其是分子所表示的隐式流形表面的层次结构。研究表明,双曲嵌入空间中的复杂层次结构变得更加显著且易于捕捉。为了同时利用扩散模型的数据生成能力和双曲嵌入提取复杂几何特征的强大能力,我们提出将扩散模型扩展到双曲流形以用于分子生成,即双曲图扩散模型(HGDM)。所提出的HGDM采用双曲变分自编码器生成节点的双曲隐表示,然后使用基于分数的双曲图神经网络学习双曲空间中的分布。数值实验结果表明,与最先进的方法相比,所提出的HGDM在多个分子数据集上取得了更高的性能。