Recent advances in single-cell technologies have advanced our understanding of gene regulation and cellular heterogeneity at single-cell resolution. Single-cell data contain both gene expression levels and the proportion of expressing cells, which makes them structurally different from bulk data. Currently, methodological work on causal mediation analysis for single-cell data remains limited and often requires specific distributional assumptions. To address this challenge, we present QuasiMed, a mediation framework specialized for single-cell data. Our proposed method comprises three steps, including (i) screening mediator candidates through penalized regression and marginal models (similar to sure independence screening), (ii) estimation of indirect effects through the average expression and the proportion of expressing cells, (iii) and hypothesis testing with multiplicity control. The key benefit of QuasiMed is that it specifies only the mean functions of the mediation models through a quasi-regression framework, thereby relaxing strict distributional assumptions. The method performance was evaluated through the real-data-inspired simulations, and demonstrated high power, false discovery rate control, and computational efficiency. Lastly, we applied QuasiMed to ROSMAP single-cell data to illustrate its potential to identify mediating causal pathways. R package is freely available on GitHub repository at https://github.com/sjahnn/QuasiMed.
翻译:单细胞技术的最新进展推动了我们在单细胞分辨率下对基因调控及细胞异质性的理解。单细胞数据同时包含基因表达水平和表达细胞比例,这使得其结构上与批量数据存在差异。目前,针对单细胞数据的因果中介分析方法学研究仍较为有限,且通常需要特定的分布假设。为解决这一挑战,我们提出了QuasiMed,一个专为单细胞数据设计的中介分析框架。该方法包含三个步骤:(i) 通过惩罚回归和边际模型筛选候选中介变量(类似于确定性独立筛选),(ii) 通过平均表达量和表达细胞比例估计间接效应,(iii) 进行多重比较校正的假设检验。QuasiMed的核心优势在于,它仅通过拟回归框架指定中介模型的均值函数,从而放宽了严格的分布假设。通过基于真实数据模拟的性能评估,该方法展现了高统计效能、错误发现率控制能力以及计算效率。最后,我们将QuasiMed应用于ROSMAP单细胞数据,以展示其在识别中介因果通路方面的潜力。对应的R包可在GitHub仓库 https://github.com/sjahnn/QuasiMed 免费获取。