Prediction of crystal system from X-ray diffraction (XRD) spectra is a critical task in materials science, particularly for perovskite materials which are known for their diverse applications in photovoltaics, optoelectronics, and catalysis. In this study, we present a machine learning (ML)-driven framework that leverages advanced models, including Time Series Forest (TSF), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and a simple feedforward neural network (NN), to classify crystal systems, point groups, and space groups from XRD data of perovskite materials. To address class imbalance and enhance model robustness, we integrated feature augmentation strategies such as Synthetic Minority Over-sampling Technique (SMOTE), class weighting, jittering, and spectrum shifting, along with efficient data preprocessing pipelines. The TSF model with SMOTE augmentation achieved strong performance for crystal system prediction, with a Matthews correlation coefficient (MCC) of 0.9, an F1 score of 0.92, and an accuracy of 97.76%. For point and space group prediction, balanced accuracies above 95% were obtained. The model demonstrated high performance for symmetry-distinct classes, including cubic crystal systems, point groups 3m and m-3m, and space groups Pnma and Pnnn. This work highlights the potential of ML for XRD-based structural characterization and accelerated discovery of perovskite materials
翻译:从X射线衍射(XRD)谱预测晶体系统是材料科学中的关键任务,尤其对于在光伏、光电子和催化领域具有广泛应用前景的钙钛矿材料。本研究提出一种机器学习驱动的框架,该框架利用包括时间序列森林(TSF)、随机森林(RF)、极限梯度提升(XGBoost)、循环神经网络(RNN)、长短期记忆网络(LSTM)、门控循环单元(GRU)以及简单前馈神经网络(NN)在内的先进模型,对钙钛矿材料XRD数据进行晶体系统、点群和空间群的分类。为解决类别不平衡问题并提升模型鲁棒性,我们整合了特征增强策略,如合成少数类过采样技术(SMOTE)、类别加权、抖动增强和谱线平移,并构建了高效的数据预处理流程。采用SMOTE增强的TSF模型在晶体系统预测中表现出色,其马修斯相关系数(MCC)达0.9,F1分数为0.92,准确率达到97.76%。在点群与空间群预测中,平衡准确率均超过95%。该模型在对称性差异显著的类别中展现出优异性能,包括立方晶系、点群3m与m-3m,以及空间群Pnma和Pnnn。本工作凸显了机器学习在基于XRD的结构表征与加速钙钛矿材料发现方面的潜力。