Efficient sampling of the Boltzmann distribution of molecular systems is a long-standing challenge. Recently, instead of generating long molecular dynamics simulations, generative machine learning methods such as normalizing flows have been used to learn the Boltzmann distribution directly, without samples. However, this approach is susceptible to mode collapse and thus often does not explore the full configurational space. In this work, we address this challenge by separating the problem into two levels, the fine-grained and coarse-grained degrees of freedom. A normalizing flow conditioned on the coarse-grained space yields a probabilistic connection between the two levels. To explore the configurational space, we employ coarse-grained simulations with active learning which allows us to update the flow and make all-atom potential energy evaluations only when necessary. Using alanine dipeptide as an example, we show that our methods obtain a speedup to molecular dynamics simulations of approximately 15.9 to 216.2 compared to the speedup of 4.5 of the current state-of-the-art machine learning approach.
翻译:高效采样分子系统的玻尔兹曼分布是一个长期存在的挑战。近年来,替代生成长分子动力学模拟的方法,诸如归一化流等生成式机器学习技术已被用于直接学习玻尔兹曼分布,而无需依赖样本。然而,这种方法容易受到模式坍塌的影响,因此往往无法探索完整的构型空间。在本工作中,我们通过将问题分解为两个层次——细粒度和粗粒度自由度——来应对这一挑战。一个以粗粒度空间为条件的归一化流建立了两个层次之间的概率连接。为了探索构型空间,我们采用结合主动学习的粗粒度模拟,这使我们能够仅在必要时更新流并进行全原子势能评估。以丙氨酸二肽为例,我们证明相较于当前最先进机器学习方法4.5倍的加速比,我们的方法获得了约15.9至216.2倍于分子动力学模拟的加速。