Coarse-grained (CG) molecular simulations offer an efficient alternative to atomistic molecular dynamics to study large and complex biological systems. The accuracy of CG simulations has been increased dramatically by the introduction of machine-learned coarse-grained (MLCG) models. However, these models are typically designed to be used at a single thermodynamic point, lack temperature transferability, and can not be used to predict temperature dependent quantities like the heat capacity. Here we introduce a thermodynamically informed, temperature-transferable MLCG framework for proteins that explicitly decomposes the CG potential of mean force (PMF) into its energetic and entropic components. The model architecture enforces an exact thermodynamic relation between the energetic and entropic components of the PMF and guarantees physically consistent extrapolation and interpolation across temperature regimes. We validate this framework on an extensive dataset spanning a total of 250 $μ$s of molecular dynamics simulations across five temperatures between 300 K and 400 K for the Chignolin protein, and demonstrate that it reproduces the temperature dependency of the reference atomistic free energy surfaces, correcting the temperature-unaware baselines. Furthermore, we show that it is possible to apply an inexpensive, post-hoc temperature-dependent correction that does not require retraining the MLCG potential, accurately recovering the atomistic heat capacity at different temperatures. Overall, this work provides a physically grounded pathway toward thermodynamically transferable MLCG simulations of complex biomolecular systems.
翻译:粗粒化分子模拟为研究大型复杂生物系统提供了一种比原子分子动力学更高效的替代方案。通过引入机器学习粗粒化模型,粗粒化模拟的准确性已显著提升。然而,这类模型通常仅适用于单一热力学点,缺乏温度可传递性,且无法用于预测热容等温度依赖量。本文提出了一种热力学自洽、温度可传递的蛋白质MLCG框架,该框架将粗粒化平均力势显式分解为能量组分和熵组分。模型架构强制满足PMF能量与熵组分之间的精确热力学关系,并保证在温度区间内物理一致的外推与插值能力。我们基于涵盖300K至400K五个温度下总计250微秒分子动力学模拟的Chignolin蛋白数据集对该框架进行验证,证明其能够复现参考原子自由能面的温度依赖性,有效校正了不考虑温度信息的基线模型。此外,研究表明可通过一种低成本的后处理温度校正方法(无需重新训练MLCG势能)精确恢复不同温度下的原子热容。总体而言,本研究为复杂生物分子体系的热力学可传递MLCG模拟提供了具有物理基础的实现路径。