We introduce the saddlepoint approximation-based conditional randomization test (spaCRT), a novel conditional independence test that effectively balances statistical accuracy and computational efficiency, inspired by applications to single-cell CRISPR screens. Resampling-based methods like the distilled conditional randomization test (dCRT) offer statistical precision but at a high computational cost. The spaCRT leverages a saddlepoint approximation to the resampling distribution of the dCRT test statistic, achieving very similar finite-sample statistical performance with significantly reduced computational demands. We prove that the spaCRT p-value approximates the dCRT p-value with vanishing relative error, and that these two tests are asymptotically equivalent. Through extensive simulations and real data analysis, we demonstrate that the spaCRT controls Type-I error and maintains high power, outperforming other asymptotic and resampling-based tests. Our method is particularly well-suited for large-scale single-cell CRISPR screen analyses, facilitating the efficient and accurate assessment of perturbation-gene associations.
翻译:我们提出了基于鞍点近似的条件随机化检验(spaCRT),这是一种新颖的条件独立性检验方法,在统计精度与计算效率之间实现了有效平衡,其灵感来源于单细胞CRISPR筛选的应用。基于重采样的方法(如蒸馏条件随机化检验dCRT)虽能提供统计精确性,但计算成本高昂。spaCRT利用对dCRT检验统计量重采样分布的鞍点近似,在显著降低计算需求的同时,获得了极为相似的有限样本统计性能。我们证明了spaCRT的p值能以可忽略的相对误差逼近dCRT的p值,且这两种检验具有渐近等价性。通过大量模拟实验和真实数据分析,我们验证了spaCRT能有效控制第一类错误并保持较高的检验功效,其性能优于其他基于渐近理论或重采样的检验方法。本方法特别适用于大规模单细胞CRISPR筛选分析,为扰动-基因关联的高效精准评估提供了有力工具。