The modifiable areal unit problem in geography or the change-of-support (COS) problem in statistics demonstrates that the interpretation of spatial (or spatio-temporal) data analysis is affected by the choice of resolutions or geographical units used in the study. The ecological fallacy is one famous example of this phenomenon. Here we investigate the ecological fallacy associated with the COS problem for multivariate spatial data with the goal of providing a data-driven discretization criterion for the domain of interest that minimizes aggregation errors. The discretization is based on a novel multiscale metric, called the Multivariate Criterion for Aggregation Error (MVCAGE). Such multi-scale representations of an underlying multivariate process are often formulated in terms of basis expansions. We show that a particularly useful basis expansion in this context is the multivariate Karhunen-Lo`eve expansion (MKLE). We use the MKLE to build the MVCAGE loss function and use it within the framework of spatial clustering algorithms to perform optimal spatial aggregation. We demonstrate the effectiveness of our approach through simulation and through regionalization of county-level income and hospital quality data over the United States and prediction of ocean color in the coastal Gulf of Alaska.
翻译:地理学中可修改面积单元问题或统计学中支撑域变化问题表明,空间(或时空)数据分析的解释受研究中选择的分辨率或地理单元的影响。生态谬误是该现象的著名案例。本文针对多元空间数据的支撑域变化问题研究生态谬误,旨在为感兴趣区域提供最小化聚合误差的数据驱动离散化判据。该离散化基于一种新型多尺度度量——多元聚合误差判据。此类潜在多元过程的多尺度表示通常通过基函数展开形式表达。我们证明该背景下特别有效的基函数展开是多元Karhunen-Loève展开。我们利用多元Karhunen-Loève展开构建多元聚合误差损失函数,并将其应用于空间聚类算法框架以实现最优空间聚合。通过模拟实验、美国县级收入与医院质量数据分区及阿拉斯加沿海湾海洋颜色预测,我们验证了该方法的有效性。