The modifiable areal unit problem in geography or the change-of-support (COS) problem in statistics demonstrates that the interpretation of spatial (or spatio-temporal) data analysis is affected by the choice of resolutions or geographical units used in the study. The ecological fallacy is one famous example of this phenomenon. Here we investigate the ecological fallacy associated with the COS problem for multivariate spatial data with the goal of providing a data-driven discretization criterion for the domain of interest that minimizes aggregation errors. The discretization is based on a novel multiscale metric, called the Multivariate Criterion for Aggregation Error (MVCAGE). Such multi-scale representations of an underlying multivariate process are often formulated in terms of basis expansions. We show that a particularly useful basis expansion in this context is the multivariate Karhunen-Lo`eve expansion (MKLE). We use the MKLE to build the MVCAGE loss function and use it within the framework of spatial clustering algorithms to perform optimal spatial aggregation. We demonstrate the effectiveness of our approach through simulation and through regionalization of county-level income and hospital quality data over the United States and prediction of ocean color in the coastal Gulf of Alaska.
翻译:地理学中的可变面元问题或统计学中的支撑变化问题表明,空间(或时空)数据分析的解释受研究中所用分辨率或地理单元选择的影响。生态学谬误是该现象的著名例子。本文针对多元空间数据中与支撑变化问题相关的生态学谬误展开研究,旨在提出一种数据驱动的感兴趣区域离散化准则,从而最小化聚合误差。该离散化基于一种名为多元聚合误差准则的新型多尺度度量。此类潜在多元过程的多尺度表示通常通过基展开构建。我们证明,该背景下一种特别有用的基展开是多元Karhunen-Loève展开。我们利用多元Karhunen-Loève展开构建多元聚合误差损失函数,并将其应用于空间聚类算法框架中实现最优空间聚合。通过仿真实验、美国县级收入与医院质量数据的区域化分析以及阿拉斯加湾沿岸海洋颜色的预测,我们验证了所提方法的有效性。