Selection of contributing factors for predicting landslide susceptibility using machine learning and deep learning models

Landslides are a common natural disaster that can cause casualties, property safety threats and economic losses. Therefore, it is important to understand or predict the probability of landslide occurrence at potentially risky sites. A commonly used means is to carry out a landslide susceptibility assessment based on a landslide inventory and a set of landslide contributing factors. This can be readily achieved using machine learning (ML) models such as logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (Xgboost), or deep learning (DL) models such as convolutional neural network (CNN) and long short time memory (LSTM). As the input data for these models, landslide contributing factors have varying influences on landslide occurrence. Therefore, it is logically feasible to select more important contributing factors and eliminate less relevant ones, with the aim of increasing the prediction accuracy of these models. However, selecting more important factors is still a challenging task and there is no generally accepted method. Furthermore, the effects of factor selection using various methods on the prediction accuracy of ML and DL models are unclear. In this study, the impact of the selection of contributing factors on the accuracy of landslide susceptibility predictions using ML and DL models was investigated. Four methods for selecting contributing factors were considered for all the aforementioned ML and DL models, which included Information Gain Ratio (IGR), Recursive Feature Elimination (RFE), Particle Swarm Optimization (PSO), Least Absolute Shrinkage and Selection Operators (LASSO) and Harris Hawk Optimization (HHO). In addition, autoencoder-based factor selection methods for DL models were also investigated. To assess their performances, an exhaustive approach was adopted,...

翻译：滑坡是一种常见的自然灾害，可能导致人员伤亡、财产威胁和经济损失。因此，了解或预测潜在风险区域发生滑坡的概率具有重要意义。常用的方法是基于滑坡编目及一组滑坡影响因素开展滑坡易发性评估。这可通过逻辑回归、支持向量机、随机森林、极端梯度提升等机器学习模型，或卷积神经网络和长短期记忆网络等深度学习模型实现。作为这些模型的输入数据，滑坡影响因素对滑坡发生具有不同程度的影响。因此，从逻辑上可行的是选择更重要的影响因素并剔除相关性较低的因素，以提高这些模型的预测精度。然而，如何选择更重要的因素仍是一项具有挑战性的任务，目前尚无公认的通用方法。此外，采用不同方法进行因素选择对机器学习和深度学习模型预测精度的影响尚不明确。本研究探讨了影响因素选择对基于机器学习和深度学习模型预测滑坡易发性精度的影响。针对上述所有机器学习和深度学习模型，考虑了四种因素选择方法，包括信息增益比、递归特征消除、粒子群优化、最小绝对收缩与选择算子以及哈里斯鹰优化算法。此外，还研究了基于自编码器的深度学习模型因素选择方法。为评估其性能，采用穷举法进行分析……