Flexible Basis Representations for Modeling High-Dimensional Hierarchical Spatial Data

Nonstationary and non-Gaussian spatial data are prevalent across many fields (e.g., counts of animal species, disease incidences in susceptible regions, and remotely-sensed satellite imagery). Due to modern data collection methods, the size of these datasets have grown considerably. Spatial generalized linear mixed models (SGLMMs) are a flexible class of models used to model nonstationary and non-Gaussian datasets. Despite their utility, SGLMMs can be computationally prohibitive for even moderately large datasets. To circumvent this issue, past studies have embedded nested radial basis functions into the SGLMM. However, two crucial specifications (knot placement and bandwidth parameters), which directly affect model performance, are typically fixed prior to model-fitting. We propose a novel approach to model large nonstationary and non-Gaussian spatial datasets using adaptive radial basis functions. Our approach: (1) partitions the spatial domain into subregions; (2) employs reversible-jump Markov chain Monte Carlo (RJMCMC) to infer the number and location of the knots within each partition; and (3) models the latent spatial surface using partition-varying and adaptive basis functions. Through an extensive simulation study, we show that our approach provides more accurate predictions than competing methods while preserving computational efficiency. We demonstrate our approach on two environmental datasets - incidences of plant species and counts of bird species in the United States.

翻译：非平稳与非高斯空间数据在众多领域中普遍存在（例如，动物物种计数、易感区域疾病发病率以及遥感卫星图像）。由于现代数据采集方法的发展，这些数据集的规模已显著增长。空间广义线性混合模型（SGLMMs）是一类用于建模非平稳和非高斯数据集的灵活模型。尽管其应用广泛，但即使是中等大小的数据集，SGLMMs 的计算成本也可能过高。为解决此问题，以往研究将嵌套径向基函数嵌入 SGLMM 中。然而，两个关键设定（节点位置和带宽参数）直接影响模型性能，且通常在模型拟合前固定。我们提出一种新方法，利用自适应径向基函数对大规模非平稳和非高斯空间数据集进行建模。该方法：(1) 将空间域划分为子区域；(2) 采用可逆跳跃马尔可夫链蒙特卡罗（RJMCMC）推断每个分区内节点的数量和位置；(3) 使用分区变化的自适应基函数对潜在空间表面建模。通过广泛的模拟研究，我们证明该方法在保持计算效率的同时，能提供比竞争方法更准确的预测。我们在两个环境数据集（美国植物物种的出现率与鸟类物种的计数）上演示了该方法的应用。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日