In real-world material research, machine learning (ML) models are usually expected to predict and discover novel exceptional materials that deviate from the known materials. It is thus a pressing question to provide an objective evaluation of ML model performances in property prediction of out-of-distribution (OOD) materials that are different from the training set distribution. Traditional performance evaluation of materials property prediction models through random splitting of the dataset frequently results in artificially high performance assessments due to the inherent redundancy of typical material datasets. Here we present a comprehensive benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction. We formulate five different categories of OOD ML problems for three benchmark datasets from the MatBench study. Our extensive experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks on average compared to their baselines in the MatBench study, demonstrating a crucial generalization gap in realistic material prediction tasks. We further examine the latent physical spaces of these GNN models and identify the sources of CGCNN, ALIGNN, and DeeperGATGNN's significantly more robust OOD performance than those of the current best models in the MatBench study (coGN and coNGN), and provide insights to improve their performance.
翻译:在真实材料研究中,机器学习模型通常需要预测和发现偏离已知材料的新型异常材料。因此,客观评估机器学习模型在不同于训练集分布的外分布(OOD)材料属性预测中的性能,成为一项紧迫问题。传统上,通过数据集随机划分来评估材料属性预测模型性能的方法,往往会因典型材料数据集固有的冗余性而得出人为较高的性能评估结果。本文针对外推性OOD材料属性预测,提出了基于结构的图神经网络(GNN)综合基准研究。我们基于MatBench研究中的三个基准数据集,构建了五类不同的OOD机器学习问题。大量实验表明,与MatBench研究中的基线相比,当前最先进的GNN算法在OOD属性预测任务上的平均性能显著下降,这揭示了真实材料预测任务中存在关键泛化差距。我们进一步检查了这些GNN模型的潜在物理空间,识别了CGCNN、ALIGNN和DeeperGATGNN相比MatBench研究中当前最佳模型(coGN和coNGN)具有显著更稳健OOD性能的来源,并提供了改进其性能的见解。