Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.
翻译:监督式机器学习方法作为密度泛函理论(DFT)等第一性原理计算方法的替代,在加速电子结构预测中的应用日益广泛。尽管大量量子化学数据集专注于化学性质与原子受力,但实现哈密顿矩阵的精确高效预测仍具有重要意义——该矩阵作为决定物理系统量子态及化学性质的最核心基础物理量,其预测能力备受期待。本研究基于QM9数据集,构建了名为QH9的新型量子哈密顿量数据集,包含2,399条分子动力学轨迹与130,831个稳定分子几何构型对应的精确哈密顿矩阵。通过设计面向不同分子的基准测试任务,我们证实现有机器学习模型具备预测任意分子哈密顿矩阵的能力。本研究以开源基准形式向学界提供QH9数据集及基线模型,这对发展机器学习方法、加速面向科学及技术应用的分子与材料设计具有重要价值。本基准的公开访问地址为:https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench。