Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.
翻译:扩散生成模型已成为解决结构生物学和基于结构的药物设计问题的强大框架。这些模型直接作用于三维分子结构。由于图神经网络随图大小扩展性不佳,以及扩散模型固有的推理速度较慢,许多现有分子扩散模型依赖蛋白质结构的粗粒化表示来使训练和推理可行。然而,这种粗粒化表示丢弃了建模分子相互作用所必需的信息,并损害了生成结构的质量。在这项工作中,我们提出了一种基于图神经网络的新型架构,用于学习分子结构的潜在表示。当与用于从头配体设计的扩散模型进行端到端训练时,我们的模型实现了与使用全原子蛋白质表示的模型相当的性能,同时推理时间减少三倍。