Spatially correlated data with an excess of zeros, usually referred to as zero-inflated spatial data, arise in many disciplines. Examples include count data, for instance, abundance (or lack thereof) of animal species and disease counts, as well as semi-continuous data like observed precipitation. Spatial two-part models are a flexible class of models for such data. Fitting two-part models can be computationally expensive for large data due to high-dimensional dependent latent variables, costly matrix operations, and slow mixing Markov chains. We describe a flexible, computationally efficient approach for modeling large zero-inflated spatial data using the projection-based intrinsic conditional autoregression (PICAR) framework. We study our approach, which we call PICAR-Z, through extensive simulation studies and two environmental data sets. Our results suggest that PICAR-Z provides accurate predictions while remaining computationally efficient. An important goal of our work is to allow researchers who are not experts in computation to easily build computationally efficient extensions to zero-inflated spatial models; this also allows for a more thorough exploration of modeling choices in two-part models than was previously possible. We show that PICAR-Z is easy to implement and extend in popular probabilistic programming languages such as nimble and stan.
翻译:具有空间相关性的数据常伴随零值过多现象,这类被称为零膨胀空间数据的数据广泛存在于多个学科领域。例如计数数据(如动物物种丰度或缺失情况、疾病计数数据)及半连续数据(如观测降水量)。空间两部分模型是处理此类数据的灵活建模框架。然而,由于高维依赖潜变量、高成本矩阵运算及马尔可夫链混合缓慢等问题,对大规模数据拟合两部分模型需付出较高计算代价。本文基于投影本征条件自回归(PICAR)框架,提出一种灵活且计算高效的零膨胀空间数据建模方法,并将该方法命名为PICAR-Z。通过大规模仿真实验及两个环境数据集验证,结果表明PICAR-Z能在保持计算效率的同时提供精准预测。本研究的重要目标在于,即使非计算领域的研究人员也能轻松构建零膨胀空间模型的计算高效扩展,从而实现对两部分模型中建模选择更全面的探索。我们证明,PICAR-Z可便捷地在主流概率编程语言(如nimble和stan)中实现与扩展。