Selection of contributing factors for predicting landslide susceptibility using machine learning and deep learning models

Landslides are a common natural disaster that can cause casualties, property safety threats and economic losses. Therefore, it is important to understand or predict the probability of landslide occurrence at potentially risky sites. A commonly used means is to carry out a landslide susceptibility assessment based on a landslide inventory and a set of landslide contributing factors. This can be readily achieved using machine learning (ML) models such as logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (Xgboost), or deep learning (DL) models such as convolutional neural network (CNN) and long short time memory (LSTM). As the input data for these models, landslide contributing factors have varying influences on landslide occurrence. Therefore, it is logically feasible to select more important contributing factors and eliminate less relevant ones, with the aim of increasing the prediction accuracy of these models. However, selecting more important factors is still a challenging task and there is no generally accepted method. Furthermore, the effects of factor selection using various methods on the prediction accuracy of ML and DL models are unclear. In this study, the impact of the selection of contributing factors on the accuracy of landslide susceptibility predictions using ML and DL models was investigated. Four methods for selecting contributing factors were considered for all the aforementioned ML and DL models, which included Information Gain Ratio (IGR), Recursive Feature Elimination (RFE), Particle Swarm Optimization (PSO), Least Absolute Shrinkage and Selection Operators (LASSO) and Harris Hawk Optimization (HHO). In addition, autoencoder-based factor selection methods for DL models were also investigated. To assess their performances, an exhaustive approach was adopted,...

翻译：滑坡是一种常见的自然灾害，可能造成人员伤亡、财产威胁和经济损失。因此，了解或预测潜在危险区域滑坡发生的概率具有重要意义。常用手段是基于滑坡编录和一组滑坡影响因素进行滑坡易发性评估。这可以通过机器学习模型（如逻辑回归、支持向量机、随机森林、极端梯度提升）或深度学习模型（如卷积神经网络和长短期记忆网络）轻松实现。作为这些模型的输入数据，滑坡影响因素对滑坡发生具有不同的影响程度。因此，从逻辑上选择更重要的影响因素并排除相关性较弱的因素，旨在提高这些模型的预测精度是可行的。然而，选择更重要的因素仍然是一项具有挑战性的任务，且尚无普遍接受的方法。此外，使用不同方法进行因子选择对机器学习和深度学习模型预测精度的影响尚不明确。本研究探讨了影响因素选择对使用机器学习和深度学习模型进行滑坡易发性预测精度的影响。针对上述所有机器学习和深度学习模型，考虑了四种因子选择方法，包括信息增益比、递归特征消除、粒子群优化、最小绝对收缩与选择算子以及哈里斯鹰优化。此外，还研究了基于自编码器的深度学习模型因子选择方法。为评估其性能，采用了一种穷举法……