Molecular dynamics (MD) simulates the time evolution of atomic systems governed by interatomic forces, and the fidelity of these simulations depends critically on the underlying force model. Classical force fields (CFFs) rely on fixed functional forms fitted to experimental or theoretical data, offering computational efficiency and broad applicability but limited accuracy in chemically diverse or reactive environments. In contrast, machine learning force fields (MLFFs) deliver near quantum chemical accuracy at molecular-mechanics cost by learning interatomic interactions directly from high level electronic structure data. While MLFFs offer improved accuracy at a fraction of the cost of quantum methods, they introduce significant computational overhead, particularly in descriptor evaluation and neural network inference. These operations pose challenges for parallel hardware due to irregular memory access, minimum data reuse and inefficient kernel execution. This work investigates the hardware performance of such models using poly alanine chains, a novel benchmark molecule system(s) with controllable input size, which used as performance evaluation test cases highlighting the computational bottlenecks of the graphical processor units when scaling out MLFF simulations. The analysis identifies key bottlenecks in descriptor and force computation, memory handling, highlighting the opportunities for improvements in the emerging area of MLFF based MD in drug discovery, that has received limited attention from a computer architecture perspective.
翻译:分子动力学(MD)通过原子间作用力模拟原子系统的时间演化,其模拟精度关键取决于底层力模型。经典力场(CFFs)依赖于通过实验或理论数据拟合的固定函数形式,虽具有计算效率高和适用性广的优点,但在化学多样性或反应性环境中精度有限。相比之下,机器学习力场(MLFFs)通过直接从高水平电子结构数据中学习原子间相互作用,能以分子力学的计算成本实现接近量子化学的精度。尽管MLFFs以量子方法的部分成本提供了更高的精度,但它们引入了显著的计算开销,特别是在描述符评估和神经网络推理环节。这些操作因不规则内存访问、数据复用率低和内核执行效率不足,对并行硬件构成了挑战。本研究采用聚丙氨酸链(一种具有可控输入尺寸的新型基准分子体系)作为性能评估测试案例,探究此类模型在硬件上的性能表现,揭示了扩展MLFF模拟时图形处理器单元的计算瓶颈。该分析明确了描述符计算、力计算及内存处理中的关键瓶颈,指出了药物发现领域基于MLFF的MD这一新兴方向在计算机体系结构层面亟待优化的机遇,而该视角目前尚未获得足够关注。