Denoising diffusion models have shown great potential in multiple research areas. Existing diffusion-based generative methods on de novo 3D molecule generation face two major challenges. Since majority heavy atoms in molecules allow connections to multiple atoms through single bonds, solely using pair-wise distance to model molecule geometries is insufficient. Therefore, the first one involves proposing an effective neural network as the denoising kernel that is capable to capture complex multi-body interatomic relationships and learn high-quality features. Due to the discrete nature of graphs, mainstream diffusion-based methods for molecules heavily rely on predefined rules and generate edges in an indirect manner. The second challenge involves accommodating molecule generation to diffusion and accurately predicting the existence of bonds. In our research, we view the iterative way of updating molecule conformations in diffusion process is consistent with molecular dynamics and introduce a novel molecule generation method named Geometric-Facilitated Molecular Diffusion (GFMDiff). For the first challenge, we introduce a Dual-Track Transformer Network (DTN) to fully excevate global spatial relationships and learn high quality representations which contribute to accurate predictions of features and geometries. As for the second challenge, we design Geometric-Facilitated Loss (GFLoss) which intervenes the formation of bonds during the training period, instead of directly embedding edges into the latent space. Comprehensive experiments on current benchmarks demonstrate the superiority of GFMDiff.
翻译:去噪扩散模型已在多个研究领域展现出巨大潜力。现有的基于扩散的生成方法在从头设计三维分子生成方面面临两大挑战。由于分子中的大部分重原子允许通过单键与多个原子连接,仅采用成对距离来建模分子几何结构是不充分的。因此,第一个挑战涉及提出一种有效的神经网络作为去噪核,使其能够捕捉复杂的多体原子间关系并学习高质量特征。由于图的离散性质,主流的基于扩散的分子方法严重依赖预定义规则,并以间接方式生成边。第二个挑战涉及将分子生成适配于扩散过程,并准确预测键的存在。在我们的研究中,我们认为扩散过程中逐步更新分子构象的方式与分子动力学一致,并提出了一种名为几何辅助分子扩散(GFMDiff)的新型分子生成方法。针对第一个挑战,我们引入了双轨Transformer网络(DTN),以充分挖掘全局空间关系并学习高质量表示,从而有助于准确预测特征和几何结构。针对第二个挑战,我们设计了几何辅助损失(GFLoss),它在训练期间干预键的形成,而非直接将边嵌入潜在空间。在当前基准上的全面实验证明了GFMDiff的优越性。