Molecular mechanics (MM) force fields -- fast, empirical models characterizing the potential energy surface of molecular systems via simple parametric pairwise and valence interactions -- have traditionally relied on labor-intensive, inflexible, and poorly extensible discrete chemical parameter assignment rules using look-up tables for discrete atom or interaction types. Here, we introduce a machine-learned MM force field, espaloma-0.3, where the rule-based discrete atom-typing schemes are replaced with a continuous atom representations using graph neural networks. Trained in an end-to-end differentiable manner directly from a large, diverse quantum chemical dataset of over 1.1M energy and force calculations, espaloma-0.3 covers chemical spaces highly relevant to the broad interest in biomolecular modeling, including small molecules, proteins, and RNA. We show that espaloma-0.3 accurately predicts quantum chemical energies and forces while maintaining stable quantum chemical energy-minimized geometries. It can self-consistently parameterize both protein and ligand, producing highly accurate protein-ligand binding free energy predictions. Capable of fitting new force fields to large quantum chemical datasets with a single GPU-day of training, this approach demonstrates significant promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest. The espaloma-0.3 force field is available for use directly or within OpenMM via the open-source Espaloma package https://github.com/choderalab/espaloma, and both the code and datasets for constructing this force field are openly available https://github.com/choderalab/refit-espaloma.
翻译:分子力学(MM)力场——通过简化的参数化对势和价键相互作用表征分子系统势能面的快速经验模型——传统上依赖劳动密集型、缺乏灵活性且扩展性差的离散化学参数分配规则,这些规则基于查找表定义离散原子或相互作用类型。本文介绍了一种机器学习驱动的MM力场espaloma-0.3,该力场采用图神经网络将基于规则的离散原子类型方案替换为连续原子表示。通过直接从包含超过110万次能量和力计算的大规模多样化量子化学数据集进行端到端可微训练,espaloma-0.3覆盖了与生物分子建模广泛需求高度相关的化学空间,包括小分子、蛋白质和RNA。研究表明,espaloma-0.3能够准确预测量子化学能量和力,同时保持稳定的量子化学能量最小化几何结构。该力场可自洽地参数化蛋白质和配体,实现高精度的蛋白质-配体结合自由能预测。由于仅需单GPU天的训练即可将新力场拟合至大型量子化学数据集,该方法为系统构建更精确且易于扩展至新型化学领域的力场展现了重要前景。espaloma-0.3力场可通过开源Espaloma包(https://github.com/choderalab/espaloma)直接使用或集成至OpenMM中,力场构建代码与数据集均开放获取(https://github.com/choderalab/refit-espaloma)。