Developing accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended timescales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data derived from extensive simulations or memory-intensive end-to-end differentiable simulations. Once trained, the model can be employed to run parallel molecular dynamics simulations and sample folding events for proteins both within and beyond the training distribution, showcasing its extrapolation capabilities. By applying Markov State Models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations. Owing to its theoretical transferability and ability to use solely experimental static structures as training data, we anticipate that this approach will prove advantageous for developing new protein force fields and further advancing the study of protein dynamics, folding, and interactions.
翻译:开发蛋白质的精确且高效的粗粒化表示,对于理解其在延长时间尺度上的折叠、功能及相互作用至关重要。我们的方法涉及通过分子动力学模拟蛋白质,并利用生成的轨迹,通过可微轨迹重新加权来训练神经网络势。值得注意的是,该方法仅需蛋白质的天然构象,无需来自大规模模拟或内存密集型的端到端可微模拟的标记数据。训练完成后,该模型可用于运行并行分子动力学模拟,并采样训练分布内外蛋白质的折叠事件,展现了其外推能力。通过应用马尔可夫状态模型,可以从粗粒化模拟中预测模拟蛋白质的类天然构象。鉴于其理论上的可迁移性以及仅将实验静态结构作为训练数据的能力,我们预计这种方法将有助于开发新的蛋白质力场,并进一步推动蛋白质动力学、折叠及相互作用的研究。