Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use atoms as inputs which lack explicit modeling of the aforementioned factors. To alleviate this limitation, we propose Moleformer, a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them using rotational and translational invariant geometry-aware spatial encoding. Proposed spatial encoding calculates relative position information including distances and angles among nodes and edges. We benchmark Moleformer on OC20 and QM9 datasets, and our model achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties compared to other Transformer and Graph Neural Network (GNN) methods which proves the effectiveness of the proposed geometry-aware spatial encoding in Moleformer.
翻译:分子动力学模拟在计算物理、化学、材料科学和生物学中具有重要意义。基于机器学习的方法在预测分子能量和性质方面展现出强大能力,且计算速度远快于密度泛函理论(DFT)计算。分子能量至少与原子、化学键、键角、扭转角及非键原子对相关。以往的Transformer模型仅以原子作为输入,缺乏对上述因素的显式建模。为缓解这一局限性,我们提出Moleformer——一种新型Transformer架构,它以节点(原子)和边(化学键及非键原子对)作为输入,并利用具有旋转和平移不变性的几何感知空间编码对其相互作用进行建模。所提出的空间编码方法可计算包含距离和角度在内的节点与边之间的相对位置信息。我们在OC20和QM9数据集上对Moleformer进行基准测试:该模型在OC20从初始态到松弛态的能量预测任务上达到最先进水平,在QM9量子化学性质预测任务中相比其他Transformer和图神经网络方法展现出极具竞争力的性能,这证明了Moleformer中几何感知空间编码的有效性。