Quantifying spatial and/or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model, but severe computational bottlenecks arise when spatial dependence is encoded as a latent Gaussian process (GP) in the increasingly common large scale data settings on which we focus. The scenario worsens in non-Gaussian models because the reduced analytical tractability leads to additional hurdles to computational efficiency. In this article, we introduce Bayesian models of spatially referenced data in which the likelihood or the latent process (or both) are not Gaussian. First, we exploit the advantages of spatial processes built via directed acyclic graphs, in which case the spatial nodes enter the Bayesian hierarchy and lead to posterior sampling via routine Markov chain Monte Carlo (MCMC) methods. Second, motivated by the possible inefficiencies of popular gradient-based sampling approaches in the multivariate contexts on which we focus, we introduce the simplified manifold preconditioner adaptation (SiMPA) algorithm which uses second order information about the target but avoids expensive matrix operations. We demostrate the performance and efficiency improvements of our methods relative to alternatives in extensive synthetic and real world remote sensing and community ecology applications with large scale data at up to hundreds of thousands of spatial locations and up to tens of outcomes. Software for the proposed methods is part of R package 'meshed', available on CRAN.
翻译:在贝叶斯分层模型中,通过空间随机效应可以实现对不同类型多元地理定位数据中空间和/或时间关联的量化,但当空间依赖性被编码为潜在高斯过程时,在我们重点关注的大规模数据场景中会出现严重的计算瓶颈。非高斯模型的情况更为严峻,因为解析可处理性的降低导致计算效率面临额外障碍。本文针对似然函数或潜在过程(或两者)均为非高斯形式的空间参考数据,提出了贝叶斯建模方法。首先,我们利用基于有向无环图构建空间过程的优势,在此框架下空间节点进入贝叶斯分层结构,并通过常规马尔可夫链蒙特卡洛方法实现后验抽样。其次,针对我们关注的多元场景中常用的基于梯度的采样方法可能存在的效率低下问题,我们引入简化流形预处理器自适应算法,该算法利用目标分布的二阶信息,但避免了昂贵的矩阵运算。通过包含数十万空间位置和数十个结果变量的大规模数据合成实验及真实遥感与群落生态学应用案例,我们验证了所提方法相对于替代方案在性能与效率上的提升。本文方法的配套软件已收录于CRAN平台的R语言程序包'meshed'中。