It has become common to perform kinetic analysis using approximate Koopman operators that transforms high-dimensional time series of observables into ranked dynamical modes. Key to a practical success of the approach is the identification of a set of observables which form a good basis in which to expand the slow relaxation modes. Good observables are, however, difficult to identify {\em a priori} and sub-optimal choices can lead to significant underestimations of characteristic timescales. Leveraging the representation of slow dynamics in terms of Hidden Markov Model (HMM), we propose a simple and computationally efficient clustering procedure to infer surrogate observables that form a good basis for slow modes. We apply the approach to an analytically solvable model system, as well as on three protein systems of different complexities. We consistently demonstrate that the inferred indicator functions can significantly improve the estimation of the leading eigenvalues of the Koopman operators and correctly identify key states and transition timescales of stochastic systems, even when good observables are not known {\em a priori}.
翻译:利用近似库普曼算子将高维可观测量时间序列转换为排序动力学模态进行动力学分析的方法已十分普遍。该方法成功的关键在于识别一组可观测量,使其构成展开慢弛豫模式的良好基。然而,先验地识别良好可观测量十分困难,次优选择可能导致特征时间尺度的显著低估。基于隐马尔可夫模型(HMM)对慢动力学的表征,本文提出一种简单且计算高效的聚类程序,用于推断构成慢模式良好基的代理可观测量。我们将该方法应用于解析可解模型系统以及三个不同复杂程度的蛋白质系统。研究一致表明:即使无法先验获得良好可观测量,所推断的指示函数也能显著改善库普曼算子主导特征值的估计,并准确识别随机系统的关键状态与跃迁时间尺度。