Most of the existing machine-learning schemes applied to atomic-scale simulations rely on a local description of the geometry of a structure, and struggle to model effects that are driven by long-range physical interactions. Efforts to overcome these limitations have focused on the direct incorporation of electrostatics, which is the most prominent effect, often relying on architectures that mirror the functional form of explicit physical models. Including other forms of non-bonded interactions, or predicting properties other than the interatomic potential, requires ad hoc modifications. We propose an alternative approach that extends the long-distance equivariant (LODE) framework to generate local descriptors of an atomic environment that resemble non-bonded potentials with arbitrary asymptotic behaviors, ranging from point-charge electrostatics to dispersion forces. We show that the LODE formalism is amenable to a direct physical interpretation in terms of a generalized multipole expansion, that simplifies its implementation and reduces the number of descriptors needed to capture a given asymptotic behavior. These generalized LODE features provide improved extrapolation capabilities when trained on structures dominated by a given asymptotic behavior, but do not help in capturing the wildly different energy scales that are relevant for a more heterogeneous data set. This approach provides a practical scheme to incorporate different types of non-bonded interactions, and a framework to investigate the interplay of physical and data-related considerations that underlie this challenging modeling problem.
翻译:大多数现有的原子尺度模拟机器学习方案依赖于局域几何结构描述,难以刻画由长程物理相互作用驱动的效应。克服这些局限的努力聚焦于直接引入静电作用(最显著的效应),通常采用镜像显式物理模型函数形式的架构。包含其他形式的非键相互作用或预测除原子间势以外的性质需要进行特定修改。我们提出一种替代方法,将长距离等变(LODE)框架扩展为生成与原子环境局域描述符,这些描述符可模拟具有任意渐近行为(从点电荷静电作用到色散力)的非键势。研究表明,LODE形式体系可通过广义多极展开获得直接的物理解释,从而简化其实现并减少捕捉特定渐近行为所需的描述符数量。这些广义LODE特征在基于受特定渐近行为主导的结构训练时展现出改进的外推能力,但在捕捉异构数据集中不同能量尺度方面无帮助。该方法为融入不同类型的非键相互作用提供了实用方案,并构建了研究物理考量与数据考量交织作用的理论框架,这正是此类建模挑战的核心所在。