Cell type deconvolution is a computational method that estimates the proportions of different cell types within bulk transcriptomics data by leveraging information from reference single-cell RNA sequencing data. Despite its origin as a simple linear regression model, this approach faces challenges due to technical and biological variability and biases between the bulk and single-cell datasets. While several new methods have been developed, most only provide point estimates of cell type proportions, neglecting the uncertainty inherent in these estimates. Consequently, false positives can arise when comparing changes in cell type proportions across multiple individuals. In this paper, we introduce MEAD, a comprehensive statistical framework for efficient cell type deconvolution. Our approach constructs asymptotically valid confidence intervals for individual cell type proportions, as well as for quantifying changes in cell type proportions across multiple individuals. Our analysis accounts for factors such as biological variability in gene expressions, gene-gene dependence, cross-platform biases, and sequencing errors, without relying on parametric assumptions about the data distributions. Moreover, we establish necessary and sufficient conditions for identifying cell type proportions in the presence of platform-specific biases across sequencing technologies.
翻译:细胞类型反卷积是一种计算方法,它通过利用参考单细胞RNA测序数据的信息,来估计批量转录组学数据中不同细胞类型的比例。尽管其起源于简单的线性回归模型,但该方法因技术和生物变异以及批量与单细胞数据集之间的偏差而面临挑战。尽管已有多种新方法被开发出来,但大多数仅提供细胞类型比例的点估计,忽视了这些估计中固有的不确定性。因此,在比较多个个体间细胞类型比例的变化时,可能会产生假阳性结果。在本文中,我们提出了MEAD,一个用于高效细胞类型反卷积的综合统计框架。我们的方法为单个细胞类型比例以及量化多个个体间细胞类型比例的变化构建了渐近有效的置信区间。我们的分析考虑了基因表达中的生物变异、基因-基因依赖性、跨平台偏差和测序错误等因素,且无需对数据分布做出参数假设。此外,我们建立了在测序技术存在平台特异性偏差的情况下识别细胞类型比例的必要且充分条件。