The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences that are rarely articulated. We propose using machine learning to "bottle" such operational intuition into quantifiable descriptors using expertly curated measurement-based data. We introduce "Materials Expert-Artificial Intelligence" (ME-AI) to encapsulate and articulate this human intuition. As a first step towards such a program, we focus on the topological semimetal (TSM) among square-net materials as the property inspired by the expert-identified descriptor based on structural information: the tolerance factor. We start by curating a dataset encompassing 12 primary features of 879 square-net materials, using experimental data whenever possible. We then use Dirichlet-based Gaussian process regression using a specialized kernel to reveal composite descriptors for square-net topological semimetals. The ME-AI learned descriptors independently reproduce expert intuition and expand upon it. Specifically, new descriptors point to hypervalency as a critical chemical feature predicting TSM within square-net compounds. Our success with a carefully defined problem points to the "machine bottling human insight" approach as promising for machine learning-aided material discovery.
翻译:材料数据库的出现为从海量数据空间中揭示新兴材料性质的可预测描述符提供了前所未有的机会。然而,普遍依赖高通量从头计算数据必然继承了此类数据的局限性:即与实验不符。另一方面,实验决策通常由专家基于鲜少言表的经验所锤炼的直觉指导。我们提出利用机器学习,通过基于专家精选的测量数据,将这种操作性直觉"封装"为可量化描述符。我们引入"材料专家-人工智能"(ME-AI)来封装和表达这种人类直觉。作为实现该计划的第一步,我们聚焦于方网材料中的拓扑半金属,其性质受启发于专家基于结构信息所识别的描述符:容忍因子。我们首先通过尽可能使用实验数据,整理了包含879种方网材料12个主要特征的数据集。随后,利用专有核函数进行基于狄利克雷的高斯过程回归,揭示了方网拓扑半金属的复合描述符。ME-AI学习到的描述符独立重现了专家直觉并加以拓展。具体而言,新描述符指出超价性作为预测方网化合物中拓扑半金属的关键化学特征。我们在精确定义的问题上的成功表明,"机器封装人类洞察"方法对于机器学习辅助材料发现具有广阔前景。