Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules. However, most existing ML models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data. Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties. As case studies, we apply the model to aromatic compounds and semiconducting polymers on both ground state and excited state properties, demonstrating its accuracy and generalization capability to complex systems that are hard to calculate using CCSD(T)-level methods.
翻译:机器学习在量子化学中扮演着重要角色,为分子的多种性质提供了快速评估的预测模型。然而,现有大多数用于分子电子性质的机器学习模型在训练中使用密度泛函理论数据库作为基准真值,其预测精度无法超越密度泛函理论。本工作开发了一种面向有机分子电子结构的统一机器学习方法,以金标准CCSD(T)计算作为训练数据。在碳氢化合物分子上的测试表明,我们的模型在多种量子化学性质的计算成本和预测精度上均优于采用广泛使用的杂化泛函和双杂化泛函的密度泛函理论。作为案例研究,我们将该模型应用于芳香族化合物和半导体聚合物的基态与激发态性质,证明了其对难以使用CCSD(T)级别方法计算的复杂体系具有精确的预测能力和良好的泛化能力。