Nonstationary and non-Gaussian spatial data are prevalent across many fields (e.g., counts of animal species, disease incidences in susceptible regions, and remotely-sensed satellite imagery). Due to modern data collection methods, the size of these datasets have grown considerably. Spatial generalized linear mixed models (SGLMMs) are a flexible class of models used to model nonstationary and non-Gaussian datasets. Despite their utility, SGLMMs can be computationally prohibitive for even moderately large datasets. To circumvent this issue, past studies have embedded nested radial basis functions into the SGLMM. However, two crucial specifications (knot placement and bandwidth parameters), which directly affect model performance, are typically fixed prior to model-fitting. We propose a novel approach to model large nonstationary and non-Gaussian spatial datasets using adaptive radial basis functions. Our approach: (1) partitions the spatial domain into subregions; (2) employs reversible-jump Markov chain Monte Carlo (RJMCMC) to infer the number and location of the knots within each partition; and (3) models the latent spatial surface using partition-varying and adaptive basis functions. Through an extensive simulation study, we show that our approach provides more accurate predictions than competing methods while preserving computational efficiency. We demonstrate our approach on two environmental datasets - incidences of plant species and counts of bird species in the United States.
翻译:非平稳与非高斯空间数据在众多领域中普遍存在(例如,动物物种计数、易感区域疾病发病率以及遥感卫星图像)。由于现代数据采集方法的发展,这些数据集的规模已显著增长。空间广义线性混合模型(SGLMMs)是一类用于建模非平稳和非高斯数据集的灵活模型。尽管其应用广泛,但即使是中等大小的数据集,SGLMMs 的计算成本也可能过高。为解决此问题,以往研究将嵌套径向基函数嵌入 SGLMM 中。然而,两个关键设定(节点位置和带宽参数)直接影响模型性能,且通常在模型拟合前固定。我们提出一种新方法,利用自适应径向基函数对大规模非平稳和非高斯空间数据集进行建模。该方法:(1) 将空间域划分为子区域;(2) 采用可逆跳跃马尔可夫链蒙特卡罗(RJMCMC)推断每个分区内节点的数量和位置;(3) 使用分区变化的自适应基函数对潜在空间表面建模。通过广泛的模拟研究,我们证明该方法在保持计算效率的同时,能提供比竞争方法更准确的预测。我们在两个环境数据集(美国植物物种的出现率与鸟类物种的计数)上演示了该方法的应用。