Identifying reaction coordinates(RCs) is an active area of research, given the crucial role RCs play in determining the progress of a chemical reaction. The choice of the reaction coordinate is often based on heuristic knowledge. However, an essential criterion for the choice is that the coordinate should capture both the reactant and product states unequivocally. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. We used a regularised sparse autoencoder, an energy-based model, to discover a crucial set of reaction coordinates. Along with discovering reaction coordinates, our model also predicts the evolution of a molecular dynamics(MD) trajectory. We showcased that including sparsity enforcing regularisation helps in choosing a small but important set of reaction coordinates. We used two model systems to demonstrate our approach: alanine dipeptide system and proflavine and DNA system, which exhibited intercalation of proflavine into DNA minor groove in an aqueous environment. We model MD trajectory as a multivariate time series, and our latent variable model performs the task of multi-step time series prediction. This idea is inspired by the popular sparse coding approach - to represent each input sample as a linear combination of few elements taken from a set of representative patterns.
翻译:识别反应坐标(RCs)是当前活跃的研究领域,因其在决定化学反应进程中的关键作用。反应坐标的选择常基于启发式知识,但重要准则在于该坐标需同时明确捕捉反应物与产物状态,且应为最慢坐标,使其他自由度沿其快速平衡。我们采用基于能量的正则化稀疏自编码器,发现了一组关键反应坐标。除发现反应坐标外,该模型还能预测分子动力学(MD)轨迹的演化。研究表明,加入稀疏性正则化有助于筛选出少量而重要的反应坐标。我们通过两个模型系统验证该方法:丙氨酸二肽体系,以及水环境中原黄素嵌入DNA小沟的原黄素-DNA体系。我们将MD轨迹建模为多元时间序列,通过隐变量模型实现多步时间序列预测。该思想受经典稀疏编码方法启发——将每个输入样本表示为若干代表性模式中少数元素的线性组合。