Recently, diffusion models have achieved remarkable performance in data generation, e.g., generating high-quality images. Nevertheless, chemistry molecules often have complex non-Euclidean spatial structures, with the behavior changing dynamically and unpredictably. Most existing diffusion models highly rely on computing the probability distribution, i.e., Gaussian distribution, in Euclidean space, which cannot capture internal non-Euclidean structures of molecules, especially the hierarchical structures of the implicit manifold surface represented by molecules. It has been observed that the complex hierarchical structures in hyperbolic embedding space become more prominent and easier to be captured. In order to leverage both the data generation power of diffusion models and the strong capability to extract complex geometric features of hyperbolic embedding, we propose to extend the diffusion model to hyperbolic manifolds for molecule generation, namely, Hyperbolic Graph Diffusion Model (HGDM). The proposed HGDM employs a hyperbolic variational autoencoder to generate the hyperbolic hidden representation of nodes and then a score-based hyperbolic graph neural network is used to learn the distribution in hyperbolic space. Numerical experimental results show that the proposed HGDM achieves higher performance on several molecular datasets, compared with state-of-the-art methods.
翻译:近期,扩散模型在数据生成领域(例如生成高质量图像)取得了显著成效。然而,化学分子往往具有复杂的非欧几里得空间结构,其行为呈现动态且不可预测的变化。现有的大多数扩散模型高度依赖于欧几里得空间中的概率分布(即高斯分布)计算,这无法捕捉分子内部非欧几里得结构,尤其是分子所表示的隐式流形曲面的层次结构。已有研究表明,超曲嵌入空间中的复杂层次结构会更加突出且易于捕捉。为同时利用扩散模型的数据生成能力与超曲嵌入提取复杂几何特征的强大性能,我们提出将扩散模型扩展到超曲流形以用于分子生成,即超曲图扩散模型(HGDM)。所提出的HGDM采用超曲变分自编码器生成节点的超曲隐表示,进而利用基于分数的超曲图神经网络学习超曲空间中的分布。数值实验结果表明,与最先进方法相比,所提出的HGDM在多个分子数据集上实现了更高性能。