谱图法：慢速集体变量、马尔可夫动力学与过渡态系综 (Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles)

from arxiv, Accepted as part of J. Chem. Theory Comput. special issue "Machine Learning and Statistical Mechanics: Shared Synergies for Next Generation of Chemical Theory and Computation."

Understanding the behavior of complex molecular systems is a fundamental problem in physical chemistry. To describe the long-time dynamics of such systems, which is responsible for their most informative characteristics, we can identify a few slow collective variables (CVs) while treating the remaining fast variables as thermal noise. This enables us to simplify the dynamics and treat it as diffusion in a free-energy landscape spanned by slow CVs, effectively rendering the dynamics Markovian. Our recent statistical learning technique, spectral map [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220], explores this strategy to learn slow CVs by maximizing a spectral gap of a transition matrix. In this work, we introduce several advancements into our framework, using a high-dimensional reversible folding process of a protein as an example. We implement an algorithm for coarse-graining Markov transition matrices to partition the reduced space of slow CVs kinetically and use it to define a transition state ensemble. We show that slow CVs learned by spectral map closely approach the Markovian limit for an overdamped diffusion. We demonstrate that coordinate-dependent diffusion coefficients only slightly affect the constructed free-energy landscapes. Finally, we present how spectral map can be used to quantify the importance of features and compare slow CVs with structural descriptors commonly used in protein folding. Overall, we demonstrate that a single slow CV learned by spectral map can be used as a physical reaction coordinate to capture essential characteristics of protein folding.

翻译：理解复杂分子系统的行为是物理化学中的一个基本问题。为描述此类系统的长时间动力学行为——该行为决定了其最具信息量的特征——我们可以识别少数慢速集体变量，同时将剩余的快变量视为热噪声处理。这使得我们能够简化动力学，将其视为在由慢速集体变量张成的自由能景观中的扩散过程，从而有效地使动力学呈现马尔可夫性。我们近期提出的统计学习技术——谱图法 [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220] 正是基于此策略，通过最大化转移矩阵的谱隙来学习慢速集体变量。在本工作中，我们以蛋白质的高维可逆折叠过程为例，将多项改进引入我们的框架。我们实现了一种用于粗粒化马尔可夫转移矩阵的算法，以动力学方式划分慢速集体变量的约简空间，并利用该算法定义过渡态系综。我们证明，通过谱图法学习到的慢速集体变量非常接近过阻尼扩散的马尔可夫极限。我们展示了坐标依赖的扩散系数仅对构建的自由能景观产生轻微影响。最后，我们说明了如何利用谱图法量化特征的重要性，并将慢速集体变量与蛋白质折叠中常用的结构描述符进行比较。总体而言，我们证明通过谱图法学习到的单一慢速集体变量可作为物理反应坐标，用于捕捉蛋白质折叠的基本特征。