The reliable identification of magnetic ground states remains a major challenge in high-throughput materials databases, where density functional theory (DFT) workflows often converge to ferromagnetic (FM) solutions. Here, we partially address this challenge by developing machine learning classifiers trained on experimentally validated MAGNDATA magnetic materials leveraging a limited number of simple compositional, structural, and electronic descriptors sourced from the Materials Project database. Our propagation vector classifiers achieve accuracies above 92%, outperforming recent studies in reliably distinguishing zero from nonzero propagation vector structures, and exposing a systematic ferromagnetic bias inherent to the Materials Project database for more than 7,843 materials. In parallel, LightGBM and XGBoost models trained directly on the Materials Project labels achieve accuracies of 84-86% (with macro F1 average scores of 63-66%), which proves useful for large-scale screening for magnetic classes, if refined by MAGNDATA-trained classifiers. These results underscore the role of machine learning techniques as corrective and exploratory tools, enabling more trustworthy databases and accelerating progress toward the identification of materials with various properties.
翻译:可靠地识别磁基态仍是大规模材料数据库中的一项主要挑战,其中密度泛函理论(DFT)工作流通常收敛于铁磁(FM)解。在此,我们通过开发机器学习分类器部分解决了这一挑战,这些分类器利用从材料项目数据库获取的少量简单成分、结构和电子描述符,基于经实验验证的MAGNDATA磁性材料进行训练。我们的传播向量分类器准确率超过92%,在可靠区分零与非零传播向量结构方面优于近期研究,并揭示了材料项目数据库对超过7,843种材料固有的系统铁磁偏差。同时,直接基于材料项目标签训练的LightGBM和XGBoost模型达到了84-86%的准确率(宏观F1平均分数为63-66%),若经MAGNDATA训练的分类器优化,则对大规模磁性类别筛选具有实用价值。这些结果突显了机器学习技术作为修正与探索工具的作用,能够实现更可靠的数据库,并加速具有各种特性的材料识别进程。