This study introduces a modified score matching method aimed at generating molecular structures with high energy accuracy. The denoising process of score matching or diffusion models mirrors molecular structure optimization, where scores act like physical force fields that guide particles toward equilibrium states. To achieve energetically accurate structures, it can be advantageous to have the score closely approximate the gradient of the actual potential energy surface. Unlike conventional methods that simply design the target score based on structural differences in Euclidean space, we propose a Riemannian score matching approach. This method represents molecular structures on a manifold defined by physics-informed internal coordinates to efficiently mimic the energy landscape, and performs noising and denoising within this space. Our method has been evaluated by refining several types of starting structures on the QM9 and GEOM datasets, demonstrating that the proposed Riemannian score matching method significantly improves the accuracy of the generated molecular structures, attaining chemical accuracy. The implications of this study extend to various applications in computational chemistry, offering a robust tool for accurate molecular structure prediction.
翻译:本研究提出了一种改进的分数匹配方法,旨在生成具有高能量精度的分子结构。分数匹配或扩散模型的去噪过程与分子结构优化过程相似,其中分数的作用类似于物理力场,引导粒子向平衡态演化。为实现能量精确的结构,使分数尽可能逼近真实势能面的梯度具有显著优势。与传统方法仅在欧几里得空间中基于结构差异设计目标分数不同,我们提出了一种黎曼分数匹配方法。该方法将分子结构表示在由物理信息内部坐标定义的流形上,以高效模拟能量景观,并在此空间内执行加噪与去噪操作。我们通过在QM9和GEOM数据集上优化多种初始结构进行评估,结果表明所提出的黎曼分数匹配方法显著提升了生成分子结构的精度,达到了化学精度水平。本研究的应用前景涵盖计算化学的多个领域,为精确分子结构预测提供了稳健的工具。