Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost.
翻译:给定一组空间特征类型集合 \emph{S}、其特征实例、研究区域及邻域关系,目标是找到满足如下条件的配对 $<$区域 ($r_{g}$), \emph{S} 的子集 \emph{C}$>$:使得 \emph{C} 在 $r_{g}$ 中构成统计显著的区域共置模式。该问题对于生态学、经济学和社会学等多个领域的应用具有重要意义。由于区域共置模式与候选区域的数量呈指数级增长,该问题的计算极具挑战性。此前,我们提出了一种挖掘器 \cite{10.1145/3557989.3566158},用于发现统计显著的区域共置模式。然而,大量同步进行的统计推断会提高错误发现的风险(也称为多重比较问题),并带来高昂的计算成本。我们提出了一种新颖算法——多重比较区域共置挖掘器(MultComp-RCM),该算法采用邦弗朗尼校正方法。理论分析、实验评估和案例研究结果表明,所提方法能够同时降低错误发现率和计算成本。