Cell type deconvolution is a computational approach to infer proportions of individual cell types from bulk transcriptomics data. Though many new methods have been developed for cell type deconvolution, most of them only provide point estimation of the cell type proportions. On the other hand, estimates of the cell type proportions can be very noisy due to various sources of bias and randomness, and ignoring their uncertainty may greatly affect the validity of downstream analyses. In this paper, we propose a comprehensive statistical framework for cell type deconvolution and construct asymptotically valid confidence intervals both for each individual's cell type proportion and for quantifying how cell type proportions change across multiple bulk individuals in downstream regression analyses. Our analysis takes into account various factors including the biological randomness of gene expressions across cells and individuals, gene-gene dependence, and the cross-platform biases and sequencing errors, and avoids any parametric assumptions on the data distributions. We also provide identification conditions of the cell type proportions when there are arbitrary platforms-specific bias across sequencing technologies.
翻译:细胞类型解卷积是一种基于批量转录组数据推断各类细胞比例的计算方法。尽管已开发出多种针对细胞类型解卷积的新方法,但大多数仅提供细胞类型比例的点估计。另一方面,由于各种偏差和随机性来源,细胞类型比例的估计结果可能噪声较大,忽略其不确定性将严重影响下游分析的有效性。本文提出一个全面的统计框架用于细胞类型解卷积,并构建渐近有效的置信区间,既适用于每个个体的细胞类型比例估计,也适用于下游回归分析中对多个批量个体间细胞类型比例变化的量化。我们的分析综合考虑了多种因素,包括基因表达在细胞和个体间的生物随机性、基因-基因依赖性、跨平台偏差及测序错误,同时避免对数据分布施加任何参数假设。此外,当不同测序技术存在任意平台特异性偏差时,我们还提供了细胞类型比例的识别条件。