We describe how interpretable boosting algorithms based on ridge-regularized generalized linear models can be used to analyze high-dimensional environmental data. We illustrate this by using environmental, social, human and biophysical data to predict the financial vulnerability of farmers in Chile and Tunisia against climate hazards. We show how group structures can be considered and how interactions can be found in high-dimensional datasets using a novel 2-step boosting approach. The advantages and efficacy of the proposed method are shown and discussed. Results indicate that the presence of interaction effects only improves predictive power when included in two-step boosting. The most important variable in predicting all types of vulnerabilities are natural assets. Other important variables are the type of irrigation, economic assets and the presence of crop damage of near farms.
翻译:本文阐述了如何基于岭正则化广义线性模型的可解释增强算法来分析高维环境数据。我们通过利用环境、社会、人类及生物物理数据,预测智利和突尼斯农户在气候灾害下的财务脆弱性,以此阐述该方法。我们展示了如何考虑分组结构,以及如何通过一种新颖的两步增强方法在高维数据集中发现交互作用。文中对该方法的优势与有效性进行了展示与讨论。结果表明,仅在两步增强中纳入交互效应才能提升预测能力。预测各类脆弱性最重要的变量为自然资源资产,其他重要变量包括灌溉类型、经济资产以及邻近农场作物受损情况。