Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.
翻译:有监督机器学习方法作为第一性原理计算方法(如密度泛函理论)的替代方案,在加速电子结构预测中的应用日益广泛。尽管已有大量量子化学数据集关注化学性质和原子受力,但实现哈密顿矩阵的准确高效预测仍具有关键意义——作为决定物理体系量子态与化学性质的最重要基础物理量,其预测能力备受期待。本研究基于QM9数据集生成全新的量子哈密顿数据集QH9,为999条或2998条分子动力学轨迹及130,831个稳定分子几何构型提供精确哈密顿矩阵。通过设计包含多种分子的基准测试任务,我们验证了现有机器学习模型具备预测任意分子哈密顿矩阵的能力。本工作通过开源基准平台向学界提供QH9数据集与基线模型,这对于发展机器学习方法、加速分子与材料设计(面向科学及技术应用)具有重要价值。我们的基准平台已公开于:https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench