We consider the problem of boundary detection for areal data, focusing on situations where for each areal unit multiple observations are available. We propose a Bayesian nonparametric mixture model for the area-specific population densities, with spatially dependent weights and a random number of components. Contrary to previously proposed methods for boundary detection, which consider one observation per areal unit, ours does not require external information such as area-specific covariates or dissimilarity metrics. Instead, by exploiting information from multiple samples per area, it is able to identify boundaries between areas that exhibit different densities. Crucially, the number of mixture components needs to be learned from data to obtain meaningful boundary detection, due to the non-identifiability of overfitted mixtures. Therefore, we assume it random by placing a prior on it. The motivating application is the analysis of economic inequality in the greater Los Angeles region, which typically yields social inequality and unrest. Efficient posterior computation is facilitated by a transdimensional Markov Chain Monte Carlo sampler which exploits the recently introduced optimal auxiliary priors to improve the mixing. The methodology is validated via extensive simulations and applied to the income data in the greater Los Angeles region. We identify several boundaries in the income distributions, which can be explained ex-post in terms of the percentage of the population without health insurance, though not in terms of the total number of crimes, showing the usefulness of such an analysis to policymakers.
翻译:我们研究了区域数据的边界检测问题,重点关注每个区域单元存在多个观测值的情形。针对区域特异性总体密度,我们提出了一种带有空间相关权重和随机成分数量的贝叶斯非参数混合模型。与先前每个区域单元仅考虑一个观测值的边界检测方法不同,本方法无需外部信息(如区域特异性协变量或差异度量),而是通过利用每个区域的多个样本信息来识别呈现不同密度的区域边界。关键在于,由于过拟合混合模型存在不可识别性,需从数据中学习混合成分数量以获得有意义的边界检测结果。因此,我们通过施加先验分布将其视为随机变量。本研究的实际应用场景是对大洛杉矶地区经济不平等现象的分析——该问题通常与社会不平等及动荡相关。我们通过引入近年提出的最优辅助先验改进了混合效率的跨维度马尔可夫链蒙特卡洛采样器,从而实现了高效的后验计算。该方法通过大量仿真实验验证,并应用于大洛杉矶地区收入数据。我们在收入分布中识别出若干边界,这些边界事后可通过未参保人口比例(而非犯罪总数)得到解释,充分证明了此类分析对政策制定者的实用价值。