A robust, scalable K-statistic for quantifying immune cell clustering in spatial proteomics data

Spatial summary statistics based on point process theory are widely used to quantify the spatial organization of cell populations in single-cell spatial proteomics data. Among these, Ripley's $K$ is a popular metric for assessing whether cells are spatially clustered or are randomly dispersed. However, the key assumption of spatial homogeneity is frequently violated in spatial proteomics data, leading to overestimates of cell clustering and colocalization. To address this, we propose a novel $K$-based method, termed \textit{KAMP} (\textbf{K} adjustment by \textbf{A}nalytical \textbf{M}oments of the \textbf{P}ermutation distribution), for quantifying the spatial organization of cells in spatial proteomics samples. \textit{KAMP} leverages background cells in each sample along with a new closed-form representation of the first and second moments of the permutation distribution of Ripley's $K$ to estimate an empirical null model. Our method is robust to inhomogeneity, computationally efficient even in large datasets, and provides approximate $p$-values for testing spatial clustering and colocalization. Methodological developments are motivated by a spatial proteomics study of 103 women with ovarian cancer, where our analysis using \textit{KAMP} shows a positive association between immune cell clustering and overall patient survival. Notably, we also find evidence that using $K$ without correcting for sample inhomogeneity may bias hazard ratio estimates in downstream analyses. \textit{KAMP} completes this analysis in just 5 minutes, compared to 538 minutes for the only competing method that adequately addresses inhomogeneity.

翻译：基于点过程理论的空间汇总统计量被广泛用于量化单细胞空间蛋白质组学数据中细胞群体的空间组织。其中，Ripley's $K$ 是评估细胞在空间上是聚集分布还是随机分散的常用度量。然而，空间蛋白质组学数据经常违背空间同质性的关键假设，导致对细胞聚类和共定位的高估。为解决此问题，我们提出了一种新颖的基于$K$的方法，称为 \textit{KAMP}（基于置换分布解析矩的 \textbf{K} 调整），用于量化空间蛋白质组学样本中细胞的空间组织。\textit{KAMP} 利用每个样本中的背景细胞，结合Ripley's $K$ 置换分布的一阶矩和二阶矩的新闭式表示，来估计一个经验零模型。我们的方法对非均匀性具有稳健性，即使在大数据集中计算效率也很高，并为检验空间聚类和共定位提供了近似的$p$值。方法学的发展受到一项对103名卵巢癌女性的空间蛋白质组学研究的启发，其中我们使用 \textit{KAMP} 的分析显示免疫细胞聚类与患者总生存期呈正相关。值得注意的是，我们还发现证据表明，在下游分析中使用未校正样本非均匀性的$K$统计量可能会偏倚风险比估计。\textit{KAMP} 完成此分析仅需5分钟，而唯一能充分处理非均匀性的竞争方法则需要538分钟。