We develop a scan statistic method for detecting local clusters in a two-sample nonhomogeneous Poisson process (NHPP) framework, motivated by copy number variation (CNV) analysis in next-generation sequencing data. The control sample is used to construct an empirical time transformation, under which the transformed case sample is approximately uniform on [0,1] under the null hypothesis. The scan statistic is defined as the maximum number of transformed points within a moving window. We show that the scan statistic converges to a generalized extreme value (GEV) distribution with an extremal index that captures the dependence induced by overlapping windows. The GEV parameters and extremal index are estimated using maximum likelihood and exceedance clustering methods, providing an asymptotic calibration of the test. A permutation procedure is also developed to provide a nonparametric alternative. Simulation studies show that the permutation calibration maintains empirical Type I error close to the nominal level across the considered settings, and the GEV calibration is accurate for smaller windows. Both proposed procedures show competitive power compared with the continuous testing method under heterogeneous baseline intensities. An application to sequencing data illustrates the effectiveness of the proposed approach for detecting CNV regions.
翻译:我们开发了一种扫描统计方法,用于在两样本非齐次泊松过程框架下检测局部聚类,该研究受下一代测序数据中拷贝数变异分析的启发。控制样本用于构建经验时间变换,在原假设下,经变换后的病例样本在[0,1]上近似均匀分布。扫描统计量定义为移动窗口内变换后点的最大数量。我们证明该扫描统计量收敛于广义极值分布,其极值指数用于刻画由重叠窗口引发的依赖性。通过最大似然法和超阈值聚类方法估计广义极值参数和极值指数,从而提供检验的渐近校准。同时开发了一种置换过程作为非参数替代方案。模拟研究表明,在所考虑的设定下,置换校准能使经验第一类错误率接近名义水平,且广义极值校准对小窗口较为精确。与背景强度异质性条件下的连续检验方法相比,两种提出的方法均显示出具有竞争力的检验功效。测序数据的应用验证了该方法在检测拷贝数变异区域方面的有效性。