The rapid expansion of citizen science initiatives has led to a significant growth of biodiversity databases, and particularly presence-only (PO) observations. PO data are invaluable for understanding species distributions and their dynamics, but their use in a Species Distribution Model (SDM) is curtailed by sampling biases and the lack of information on absences. Poisson point processes are widely used for SDMs, with Maxent being one of the most popular methods. Maxent maximises the entropy of a probability distribution across sites as a function of predefined transformations of variables, called features. In contrast, neural networks and deep learning have emerged as a promising technique for automatic feature extraction from complex input variables. Arbitrarily complex transformations of input variables can be learned from the data efficiently through backpropagation and stochastic gradient descent (SGD). In this paper, we propose DeepMaxent, which harnesses neural networks to automatically learn shared features among species, using the maximum entropy principle. To do so, it employs a normalised Poisson loss where for each species, presence probabilities across sites are modelled by a neural network. We evaluate DeepMaxent on a benchmark dataset known for its spatial sampling biases, using PO data for calibration and presence-absence (PA) data for validation across six regions with different biological groups and covariates. Our results indicate that DeepMaxent performs better than Maxent and other leading SDMs across all regions and taxonomic groups. The method performs particularly well in regions of uneven sampling, demonstrating substantial potential to increase SDM performances. In particular, our approach yields more accurate predictions than traditional single-species models, which opens up new possibilities for methodological enhancement.
翻译:公民科学计划的快速扩张显著促进了生物多样性数据库的增长,特别是仅出现(PO)观测数据。PO数据对于理解物种分布及其动态具有重要价值,但其在物种分布模型(SDM)中的应用受限于采样偏差和缺乏缺失信息。泊松点过程被广泛用于SDM,其中Maxent是最流行的方法之一。Maxent通过最大化跨站点的概率分布熵来实现,该分布是预定义变量变换(称为特征)的函数。相比之下,神经网络和深度学习已成为从复杂输入变量中自动提取特征的有前景技术。通过反向传播和随机梯度下降(SGD),可以从数据中高效学习任意复杂的输入变量变换。本文提出DeepMaxent方法,该方法利用神经网络自动学习物种间的共享特征,并应用最大熵原理。为此,它采用归一化泊松损失函数,其中每个物种在各站点的出现概率由神经网络建模。我们在一个以空间采样偏差著称的基准数据集上评估DeepMaxent,使用PO数据进行校准,并利用出现-缺失(PA)数据在六个具有不同生物群组和协变量的区域进行验证。结果表明,DeepMaxent在所有区域和分类群组中的表现均优于Maxent及其他主流SDM。该方法在采样不均匀区域表现尤为突出,显示出显著提升SDM性能的潜力。特别值得注意的是,我们的方法比传统单物种模型产生更准确的预测,这为方法学改进开辟了新的可能性。