A robust, scalable K-statistic for quantifying immune cell clustering in spatial proteomics data

Spatial summary statistics based on point process theory are widely used to quantify the spatial organization of cell populations in single-cell spatial proteomics data. Among these, Ripley's $K$ is a popular metric for assessing whether cells are spatially clustered or are randomly dispersed. However, the key assumption of spatial homogeneity is frequently violated in spatial proteomics data, leading to overestimates of cell clustering and colocalization. To address this, we propose a novel $K$-based method, termed \textit{KAMP} (\textbf{K} adjustment by \textbf{A}nalytical \textbf{M}oments of the \textbf{P}ermutation distribution), for quantifying the spatial organization of cells in spatial proteomics samples. \textit{KAMP} leverages background cells in each sample along with a new closed-form representation of the first and second moments of the permutation distribution of Ripley's $K$ to estimate an empirical null model. Our method is robust to inhomogeneity, computationally efficient even in large datasets, and provides approximate $p$-values for testing spatial clustering and colocalization. Methodological developments are motivated by a spatial proteomics study of 103 women with ovarian cancer, where our analysis using \textit{KAMP} shows a positive association between immune cell clustering and overall patient survival. Notably, we also find evidence that using $K$ without correcting for sample inhomogeneity may bias hazard ratio estimates in downstream analyses. \textit{KAMP} completes this analysis in just 5 minutes, compared to 538 minutes for the only competing method that adequately addresses inhomogeneity.

翻译：基于点过程理论的空间汇总统计量被广泛应用于量化单细胞空间蛋白质组学数据中细胞群体的空间组织结构。其中，Ripley's $K$是评估细胞空间聚集或随机分散的常用指标。然而，空间同质性的关键假设在空间蛋白质组学数据中常被违反，导致细胞聚集和共定位的高估。为解决此问题，我们提出了一种新颖的基于$K$的方法，称为\textit{KAMP}（基于置换分布解析矩的\textbf{K}调整），用于量化空间蛋白质组学样本中细胞的空间组织结构。\textit{KAMP}利用每个样本中的背景细胞，结合Ripley's $K$置换分布一阶矩和二阶矩的新闭式表示，来估计经验零模型。该方法对非均质性具有稳健性，即使在大数据集上计算效率高，并能提供用于检验空间聚集和共定位的近似$p$值。方法学的发展受到一项针对103名卵巢癌女性的空间蛋白质组学研究的启发，其中使用\textit{KAMP}的分析显示免疫细胞聚集与患者总生存期呈正相关。值得注意的是，我们还发现证据表明，在下游分析中，使用未经样本非均质性校正的$K$可能会使风险比估计产生偏差。\textit{KAMP}仅需5分钟即可完成此分析，而唯一能充分处理非均质性的竞争方法则需要538分钟。