Machine-learned coarse-grained (CG) models have the potential for simulating large molecular complexes beyond what is possible with atomistic molecular dynamics. However, training accurate CG models remains a challenge. A widely used methodology for learning CG force-fields maps forces from all-atom molecular dynamics to the CG representation and matches them with a CG force-field on average. We show that there is flexibility in how to map all-atom forces to the CG representation, and that the most commonly used mapping methods are statistically inefficient and potentially even incorrect in the presence of constraints in the all-atom simulation. We define an optimization statement for force mappings and demonstrate that substantially improved CG force-fields can be learned from the same simulation data when using optimized force maps. The method is demonstrated on the miniproteins Chignolin and Tryptophan Cage and published as open-source code.
翻译:机器学习驱动的粗粒化(CG)模型有潜力模拟超出全原子分子动力学能力范围的大型分子复合物。然而,训练精确的CG模型仍是一项挑战。一种广泛使用的学习CG力场方法是将全原子分子动力学中的力映射到CG表示,并使CG力场在平均意义上与之匹配。我们证明,在如何将全原子力映射到CG表示方面存在灵活性,而最常用的映射方法在统计上效率低下,甚至在全原子模拟存在约束时可能不正确。我们定义了力映射的优化表达式,并证明当使用优化的力映射时,可以从相同的模拟数据中学习到显著改进的CG力场。该方法在微型蛋白Chignolin和Tryptophan Cage上进行了验证,并以开源代码形式发布。