Identifying spatially contiguous clusters and repeated spatial patterns (RSP) characterized by similar underlying distributions that are spatially apart is a key challenge in modern spatial statistics. Existing constrained clustering methods enforce spatial contiguity but are limited in their ability to identify RSP. We propose a novel nonparametric framework that addresses this limitation by combining constrained clustering with a post-clustering reassigment step based on the maximum mean discrepancy (MMD) statistic. We employ a block permutation strategy within each cluster that preserves local attribute structure when approximating the null distribution of the MMD. We also show that the MMD$^2$ statistic is asymptotically consistent under second-order stationarity and spatial mixing conditions. This two-stage approach enables the detection of clusters that are both spatially distant and similar in distribution. Through simulation studies that vary spatial dependence, cluster sizes, shapes, and multivariate dimensionality, we demonstrate the robustness of our proposed framework in detecting RSP. We further illustrate its applicability through an analysis of spatial proteomics data from patients with triple-negative breast cancer. Overall, our framework presents a methodological advancement in spatial clustering, offering a flexible and robust solution for spatial datasets that exhibit repeated patterns.
翻译:识别空间上连续且具有相似底层分布但在空间上分离的重复空间模式是现代空间统计学中的一项关键挑战。现有约束聚类方法强制执行空间连续性,但识别重复空间模式的能力有限。我们提出了一种新颖的非参数框架,通过将约束聚类与基于最大均值差异统计量的聚类后重分配步骤相结合,解决了这一局限性。我们采用每个聚类内部的块置换策略,在近似最大均值差异的零分布时保留局部属性结构。我们还证明了在二阶平稳性和空间混合条件下,MMD²统计量是渐近一致的。这种两阶段方法能够检测空间上远离且分布相似的聚类。通过改变空间依赖性、聚类大小、形状和多变量维度的模拟研究,我们展示了所提出框架在检测重复空间模式方面的鲁棒性。我们进一步通过对三阴性乳腺癌患者空间蛋白质组学数据的分析说明了其适用性。总体而言,我们的框架代表了空间聚类方法学上的进步,为呈现重复模式的具有灵活性和鲁棒性的空间数据集提供了解决方案。