Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials

Deep learning methods for electronic-structure Hamiltonian prediction has offered significant computational efficiency advantages over traditional DFT methods, yet the diversity of atomic types, structural patterns, and the high-dimensional complexity of Hamiltonians pose substantial challenges to the generalization performance. In this work, we contribute on both the methodology and dataset sides to advance universal deep learning paradigm for Hamiltonian prediction. On the method side, we propose NextHAM, a neural E(3)-symmetry and expressive correction method for efficient and generalizable materials electronic-structure Hamiltonian prediction. First, we introduce the zeroth-step Hamiltonians, which can be efficiently constructed by the initial charge density of DFT, as informative descriptors of neural regression model in the input level and initial estimates of the target Hamiltonian in the output level, so that the regression model directly predicts the correction terms to the target ground truths, thereby significantly simplifying the input-output mapping for learning. Second, we present a neural Transformer architecture with strict E(3)-Symmetry and high non-linear expressiveness for Hamiltonian prediction. Third, we propose a novel training objective to ensure the accuracy performance of Hamiltonians in both real space and reciprocal space, preventing error amplification and the occurrence of "ghost states" caused by the large condition number of the overlap matrix. On the dataset side, we curate a high-quality broad-coverage large benchmark, namely Materials-HAM-SOC, comprising 17,000 material structures spanning 68 elements from six rows of the periodic table and explicitly incorporating SOC effects. Experimental results on Materials-HAM-SOC demonstrate that NextHAM achieves excellent accuracy and efficiency in predicting Hamiltonians and band structures.

翻译：用于电子结构哈密顿量预测的深度学习方法相较于传统DFT方法已展现出显著的计算效率优势，然而原子类型的多样性、结构模式的复杂性以及哈密顿量的高维特性，对其泛化性能构成了重大挑战。本研究从方法论和数据集两方面入手，以推进哈密顿量预测的通用深度学习范式。在方法层面，我们提出了NextHAM，一种具有神经E(3)对称性和高表达能力校正方法，用于高效且可泛化的材料电子结构哈密顿量预测。首先，我们引入了零步哈密顿量，其可通过DFT的初始电荷密度高效构建，作为输入层面神经回归模型的信息化描述符和输出层面对目标哈密顿量的初始估计，从而使回归模型直接预测对目标真实值的校正项，显著简化了学习所需的输入-输出映射关系。其次，我们提出了一种具有严格E(3)对称性和高非线性表达能力的神经Transformer架构用于哈密顿量预测。第三，我们设计了一种新颖的训练目标，以确保哈密顿量在实空间和倒易空间中的精度性能，防止由重叠矩阵的大条件数引起的误差放大和"鬼态"出现。在数据集层面，我们构建了一个高质量、广覆盖的大型基准数据集Materials-HAM-SOC，包含来自元素周期表六个周期、涵盖68种元素的17,000个材料结构，并明确考虑了自旋轨道耦合效应。在Materials-HAM-SOC上的实验结果表明，NextHAM在预测哈密顿量和能带结构方面实现了优异的精度和效率。