A molecule's 2D representation consists of its atoms, their attributes, and the molecule's covalent bonds. A 3D (geometric) representation of a molecule is called a conformer and consists of its atom types and Cartesian coordinates. Every conformer has a potential energy, and the lower this energy, the more likely it occurs in nature. Most existing machine learning methods for molecular property prediction consider either 2D molecular graphs or 3D conformer structure representations in isolation. Inspired by recent work on using ensembles of conformers in conjunction with 2D graph representations, we propose $\mathrm{E}$(3)-invariant molecular conformer aggregation networks. The method integrates a molecule's 2D representation with that of multiple of its conformers. Contrary to prior work, we propose a novel 2D-3D aggregation mechanism based on a differentiable solver for the \emph{Fused Gromov-Wasserstein Barycenter} problem and the use of an efficient conformer generation method based on distance geometry. We show that the proposed aggregation mechanism is $\mathrm{E}$(3) invariant and propose an efficient GPU implementation. Moreover, we demonstrate that the aggregation mechanism helps to significantly outperform state-of-the-art molecule property prediction methods on established datasets.
翻译:分子的二维表示包含其原子、原子属性以及分子内的共价键。分子的三维(几何)表示称为构象体,由原子类型和笛卡尔坐标构成。每个构象体均具有势能,势能越低,其在自然界中出现的可能性越高。现有大多数用于分子性质预测的机器学习方法仅单独考虑二维分子图或三维构象体结构表示。受近期利用构象体集合结合二维图表示的研究启发,我们提出E(3)不变分子构象体聚合网络。该方法将分子的二维表示与其多个构象体表示相融合。与先前工作不同,我们提出一种基于可微分求解器处理《融合Gromov-Wasserstein重心》问题的新型二维-三维聚合机制,并采用基于距离几何的高效构象体生成方法。我们证明所提出的聚合机制具有E(3)不变性,并实现了高效的GPU计算方案。此外,实验表明该聚合机制在多个基准数据集上显著超越了当前最先进的分子性质预测方法。