From Peptides to Nanostructures: A Euclidean Transformer for Fast and Stable Machine Learned Force Fields

Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice. To address this, we propose a transformer architecture called SO3krates that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that separates invariant and equivariant information, eliminating the need for expensive tensor products. SO3krates achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on extended time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3krates demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.

翻译：近年来，基于从头计算参考数据的机器学习力场（MLFFs）开发取得了显著进展。尽管实现了较低的测试误差，但MLFFs在分子动力学（MD）模拟中的可靠性因长时间尺度模拟中的不稳定性问题面临日益严格的审视。我们的研究发现，MLFFs对累积误差的鲁棒性可能与等变表示的使用存在潜在关联，但这类表示带来的计算成本在实践中限制了其优势。为解决这一问题，我们提出一种名为SO3krates的变换器架构，该架构将稀疏等变表示（欧几里得变量）与可分离不变和等变信息的自注意力机制相结合，从而规避了昂贵的张量积运算。SO3krates实现了精度、稳定性与速度的独特组合，使得在扩展的时间与系统尺寸尺度上对物质量子特性进行深入分析成为可能。为展示这一能力，我们生成了包含数百个原子的柔性多肽及超分子结构的稳定MD轨迹。此外，通过探索数千个极小值点，我们研究了中型链状分子（如小多肽）的势能面拓扑结构。值得注意的是，SO3krates在稳定性与超越训练数据的新最小能量构象涌现这一矛盾需求之间展现了平衡能力，这对于生物化学领域的真实探索任务至关重要。