Efficient sampling of the Boltzmann distribution of molecular systems is a long-standing challenge. Recently, instead of generating long molecular dynamics simulations, generative machine learning methods such as normalizing flows have been used to learn the Boltzmann distribution directly, without samples. However, this approach is susceptible to mode collapse and thus often does not explore the full configurational space. In this work, we address this challenge by separating the problem into two levels, the fine-grained and coarse-grained degrees of freedom. A normalizing flow conditioned on the coarse-grained space yields a probabilistic connection between the two levels. To explore the configurational space, we employ coarse-grained simulations with active learning which allows us to update the flow and make all-atom potential energy evaluations only when necessary. Using alanine dipeptide as an example, we show that our methods obtain a speedup to molecular dynamics simulations of approximately 15.9 to 216.2 compared to the speedup of 4.5 of the current state-of-the-art machine learning approach.
翻译:分子系统玻尔兹曼分布的高效采样是一个长期挑战。近年来,与生成长期分子动力学模拟不同,诸如归一化流等生成式机器学习方法已被用于直接学习玻尔兹曼分布,而无需依赖样本。然而,这种方法容易陷入模式崩溃,因而通常无法探索完整的构型空间。在本工作中,我们通过将问题分为两个层次(精细粒度和粗粒化自由度)来应对这一挑战。一种以粗粒化空间为条件的归一化流在两个层次之间建立了概率连接。为了探索构型空间,我们采用基于主动学习的粗粒化模拟,从而仅在必要时更新归一化流并进行全原子势能评估。以丙氨酸二肽为例,我们证明该方法相较于分子动力学模拟可获得约15.9至216.2倍的加速比,而当前最优机器学习方法的加速比仅为4.5。