QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules

Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.

翻译：监督式机器学习方法作为密度泛函理论（DFT）等第一性原理计算方法的替代，在加速电子结构预测中的应用日益广泛。尽管大量量子化学数据集专注于化学性质与原子受力，但实现哈密顿矩阵的精确高效预测仍具有重要意义——该矩阵作为决定物理系统量子态及化学性质的最核心基础物理量，其预测能力备受期待。本研究基于QM9数据集，构建了名为QH9的新型量子哈密顿量数据集，包含2,399条分子动力学轨迹与130,831个稳定分子几何构型对应的精确哈密顿矩阵。通过设计面向不同分子的基准测试任务，我们证实现有机器学习模型具备预测任意分子哈密顿矩阵的能力。本研究以开源基准形式向学界提供QH9数据集及基线模型，这对发展机器学习方法、加速面向科学及技术应用的分子与材料设计具有重要价值。本基准的公开访问地址为：https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日