Molecular mechanics (MM) force fields -- the models that characterize the energy landscape of molecular systems via simple pairwise and polynomial terms -- have traditionally relied on human expert-curated, inflexible, and poorly extensible discrete chemical parameter assignment rules, namely atom or valence types. Recently, there has been significant interest in using graph neural networks to replace this process, while enabling the parametrization scheme to be learned in an end-to-end differentiable manner directly from quantum chemical calculations or condensed-phase data. In this paper, we extend the Espaloma end-to-end differentiable force field construction approach by incorporating both energy and force fitting directly to quantum chemical data into the training process. Building on the OpenMM SPICE dataset, we curate a dataset containing chemical spaces highly relevant to the broad interest of biomolecular modeling, covering small molecules, proteins, and RNA. The resulting force field, espaloma 0.3.0, self-consistently parametrizes these diverse biomolecular species, accurately predicts quantum chemical energies and forces, and maintains stable quantum chemical energy-minimized geometries. Surprisingly, this simple approach produces highly accurate protein-ligand binding free energies when self-consistently parametrizing protein and ligand. This approach -- capable of fitting new force fields to large quantum chemical datasets in one GPU-day -- shows significant promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest.
翻译:分子力学(MM)力场——通过简单的成对和多项式项表征分子系统能量景观的模型——传统上依赖于人类专家精心设计、缺乏灵活性且难以扩展的离散化学参数分配规则,即原子或价态类型。近年来,利用图神经网络替代这一过程并实现参数化方案从量子化学计算或凝聚相数据中以端到端可微方式学习的思路引起了广泛关注。本文通过将能量和力的直接拟合(对照量子化学数据)纳入训练过程,扩展了埃斯帕洛玛端到端可微力场构建方法。基于OpenMM SPICE数据集,我们整理了一个涵盖与生物分子建模广泛兴趣高度相关的化学空间数据集,包括小分子、蛋白质和RNA。由此产生的力场espaloma 0.3.0能够自洽地参数化这些多样化的生物分子种类,准确预测量子化学能量和力,并保持稳定的量子化学能量最小化几何构型。令人惊讶的是,这种简单方法在自洽参数化蛋白质和配体时,能够产生高度准确的蛋白质-配体结合自由能。该方法——能够在单GPU天内将新力场拟合至大规模量子化学数据集——为系统构建更精确、且易于扩展到新化学领域力场提供了一条极具前景的路径。