Molecular conformer generation (MCG) is an important task in cheminformatics and drug discovery. The ability to efficiently generate low-energy 3D structures can avoid expensive quantum mechanical simulations, leading to accelerated screenings and enhanced structural exploration. Several generative models have been developed for MCG, but many struggle to consistently produce high-quality conformers. To address these issues, we introduce CoarsenConf, which coarse-grains molecular graphs based on torsional angles and integrates them into an SE(3)-equivariant hierarchical variational autoencoder. Through equivariant coarse-graining, we aggregate the fine-grained atomic coordinates of subgraphs connected via rotatable bonds, creating a variable-length coarse-grained latent representation. Our model uses a novel aggregated attention mechanism to restore fine-grained coordinates from the coarse-grained latent representation, enabling efficient autoregressive generation of large molecules. Furthermore, our work expands current conformer generation benchmarks and introduces new metrics to better evaluate the quality and viability of generated conformers. We demonstrate that CoarsenConf generates more accurate conformer ensembles compared to prior generative models and traditional cheminformatics methods.
翻译:摘要:分子构象生成(MCG)是化学信息学和药物发现中的一项重要任务。高效生成低能三维结构的能力可避免昂贵的量子力学模拟,从而加速筛选并增强结构探索。已有多种生成模型被开发用于MCG,但许多模型难以持续生成高质量构象。为解决这些问题,我们提出了CoarsenConf,该方法基于扭转角对分子图进行粗粒化,并将其集成到SE(3)等变层次变分自编码器中。通过等变粗粒化,我们聚合了通过可旋转键连接的子图的细粒度原子坐标,从而构建可变长度的粗粒化潜表示。模型采用新颖的聚合注意力机制,从粗粒化潜表示中恢复细粒度坐标,实现了大分子的高效自回归生成。此外,我们的工作扩展了现有构象生成基准,并引入了新指标以更优地评估生成构象的质量与可行性。实验证明,与先前的生成模型及传统化学信息学方法相比,CoarsenConf能生成更精确的构象系综。