Identifying reaction coordinates(RCs) is an active area of research, given the crucial role RCs play in determining the progress of a chemical reaction. The choice of the reaction coordinate is often based on heuristic knowledge. However, an essential criterion for the choice is that the coordinate should capture both the reactant and product states unequivocally. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. We used a regularised sparse autoencoder, an energy-based model, to discover a crucial set of reaction coordinates. Along with discovering reaction coordinates, our model also predicts the evolution of a molecular dynamics(MD) trajectory. We showcased that including sparsity enforcing regularisation helps in choosing a small but important set of reaction coordinates. We used two model systems to demonstrate our approach: alanine dipeptide system and proflavine and DNA system, which exhibited intercalation of proflavine into DNA minor groove in an aqueous environment. We model MD trajectory as a multivariate time series, and our latent variable model performs the task of multi-step time series prediction. This idea is inspired by the popular sparse coding approach - to represent each input sample as a linear combination of few elements taken from a set of representative patterns.
翻译:识别反应坐标(RCs)是当前研究的热点领域,因其在决定化学反应进程中的关键作用。反应坐标的选择通常基于启发式知识。然而,选择的重要标准是该坐标必须明确捕获反应物和产物状态。此外,该坐标应是最缓慢的坐标,使得所有其他自由度能沿此反应坐标快速达到平衡。我们采用正则化稀疏自编码器(一种基于能量的模型)来发现一组关键反应坐标。在发现反应坐标的同时,我们的模型还能预测分子动力学(MD)轨迹的演化过程。研究表明,引入稀疏性正则化有助于选择少量但重要的反应坐标。我们使用两个模型系统验证该方法:丙氨酸二肽系统以及黄素与DNA系统(后者展示了在水环境中黄素嵌入DNA小沟的过程)。我们将MD轨迹建模为多元时间序列,潜变量模型执行多步时间序列预测任务。此思路受流行的稀疏编码方法启发——将每个输入样本表示为从一组代表性模式中选取的若干元素的线性组合。