Neural network (NN) potentials promise highly accurate molecular dynamics (MD) simulations within the computational complexity of classical MD force fields. However, when applied outside their training domain, NN potential predictions can be inaccurate, increasing the need for Uncertainty Quantification (UQ). Bayesian modeling provides the mathematical framework for UQ, but classical Bayesian methods based on Markov chain Monte Carlo (MCMC) are computationally intractable for NN potentials. By training graph NN potentials for coarse-grained systems of liquid water and alanine dipeptide, we demonstrate here that scalable Bayesian UQ via stochastic gradient MCMC (SG-MCMC) yields reliable uncertainty estimates for MD observables. We show that cold posteriors can reduce the required training data size and that for reliable UQ, multiple Markov chains are needed. Additionally, we find that SG-MCMC and the Deep Ensemble method achieve comparable results, despite shorter training and less hyperparameter tuning of the latter. We show that both methods can capture aleatoric and epistemic uncertainty reliably, but not systematic uncertainty, which needs to be minimized by adequate modeling to obtain accurate credible intervals for MD observables. Our results represent a step towards accurate UQ that is of vital importance for trustworthy NN potential-based MD simulations required for decision-making in practice.
翻译:神经网络(NN)势函数有望在经典分子动力学(MD)力场的计算复杂度内实现高精度的分子动力学模拟。然而,当应用于训练领域之外时,NN势函数的预测可能不准确,从而增加了对不确定性量化(UQ)的需求。贝叶斯建模为UQ提供了数学框架,但基于马尔可夫链蒙特卡洛(MCMC)的经典贝叶斯方法对于NN势函数而言计算上不可行。通过训练用于液态水和丙氨酸二肽粗粒化系统的图神经网络势函数,我们在此证明,通过随机梯度MCMC(SG-MCMC)实现的可扩展贝叶斯UQ可以为MD可观测量提供可靠的不确定性估计。我们表明,冷后验可以减少所需的训练数据量,并且为了实现可靠的UQ,需要多条马尔可夫链。此外,我们发现SG-MCMC和深度集成方法取得了可比较的结果,尽管后者的训练时间更短且超参数调整更少。我们表明,这两种方法都能可靠地捕获偶然不确定性和认知不确定性,但不能捕获系统不确定性,而系统不确定性需要通过适当的建模最小化,以获得MD观测量的准确可信区间。我们的结果代表了向准确UQ迈出的一步,这对于实践中基于NN势函数的可信MD模拟至关重要。