MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling

Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species. However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution. To overcome these limitations, we introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Additionally, MaskSDM leverages Shapley values for precise predictor contribution assessments, improving upon traditional approximations. We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and approximates models trained on specific subsets of variables. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.

翻译：物种分布模型（SDMs）通过基于环境条件预测物种分布，在生物多样性研究、保护规划及生态位建模中发挥着至关重要的作用。预测变量的选择至关重要，其显著影响模型精度以及预测结果反映生态模式的能力。为确保获得有意义的见解，输入变量必须根据研究目标和目标物种的生态需求进行审慎选择。然而，现有的SDMs（包括传统方法和基于深度学习的方法）通常在变量选择方面缺乏关键能力：（i）在无需重新训练的情况下，于推理阶段灵活选择相关预测变量的能力；（ii）处理缺失预测变量值而不影响准确性的鲁棒性；（iii）解释并精确量化各预测变量贡献的可解释性。为克服这些局限，我们提出了MaskSDM，这是一种基于深度学习的新型SDM，通过采用掩码训练策略实现灵活的预测变量选择。该方法使得模型能够使用输入变量的任意子集进行预测，同时对缺失数据保持鲁棒性。它还提供了对添加或移除特定预测变量如何影响模型性能及预测结果的更清晰理解。此外，MaskSDM利用Shapley值进行精确的预测变量贡献评估，改进了传统的近似方法。我们在全球sPlotOpen数据集上评估了MaskSDM，对12,738种植物物种的分布进行了建模。结果表明，MaskSDM优于基于插补的方法，并近似于在特定变量子集上训练的模型。这些发现凸显了MaskSDM在提升SDMs适用性和采纳度方面的潜力，为开发可轻松应用于多样化生态应用的SDMs基础模型奠定了基础。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【AAAI2022】面向多标签分类的端到端概率标签特征学习

专知会员服务

32+阅读 · 2022年1月27日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CVPR2021】半监督迁移学习的自适应一致性正则化

专知会员服务

33+阅读 · 2021年3月7日