Abundance data are used in ecology for species monitoring and conservation. These count data often display several specific characteristics like numerous missing data, high variance, and a high proportion of zeros, particularly when monitoring rare species. We present a model that aims to impute missing data and estimate the effect of covariates on species presence and abundance. It is based on the log-normal Poisson model, which offers more flexibility in the variance of counts than a Poisson model. A latent variable is added for the overrepresentation of zeros in the data. The imputation of missing data is made possible by assuming that the latent variance matrix has low rank and the inclusion of covariates. \\ We demonstrate the identifiability in the presence of missing data. Since maximum likelihood inference is intractable, we use a variational expectation-maximization algorithm to infer the parameters. We provide an estimate of the asymptotic variance of the estimators and derive prediction intervals for the imputations, an estimate of the temporal trend, and a procedure for detecting a potential change in this trend. \\ We evaluate our imputations and associated prediction intervals using artificially degraded monitoring data set. We conclude with an illustration on a monitoring waterbirds data set.
翻译:丰度数据在生态学中常用于物种监测与保护。这类计数数据通常呈现若干特定特征,如大量数据缺失、高方差以及高比例的零值,在监测稀有物种时尤为明显。本文提出一种旨在填补缺失数据并估计协变量对物种存在与丰度影响的模型。该模型基于对数正态泊松模型,相比传统泊松模型能更灵活地处理计数数据的方差问题。针对数据中零值的过度表征,模型引入了潜变量进行建模。通过假设潜变量方差矩阵具有低秩特性并结合协变量,实现了对缺失数据的插补。\\ 我们证明了在存在缺失数据情况下的模型可识别性。由于极大似然推断难以处理,我们采用变分期望最大化算法进行参数推断。我们提供了估计量的渐近方差估计,推导出插补值的预测区间、时间趋势的估计量,以及检测该趋势潜在变化的统计流程。\\ 通过人工降质的监测数据集,我们评估了插补结果及其关联预测区间的性能。最后以水鸟监测数据集为例进行了实证演示。