Background: Understanding cellular diversity throughout the body is essential for elucidating the complex functions of biological systems. Recently, large-scale single-cell omics datasets, known as omics atlases, have become available. These atlases encompass data from diverse tissues and cell-types, providing insights into the landscape of cell-type-specific gene expression. However, the isolated effect of the tissue environment has not been thoroughly investigated. Evaluating this isolated effect is challenging due to statistical confounding with cell-type effects, arising from significant biases in the combinations of tissues and cell-types within the body. Results: This study introduces a novel data analysis framework, named the Combinatorial Sub-dataset Extraction for Confounding Reduction (COSER), which addresses statistical confounding by using graph theory to enumerate appropriate sub-datasets. COSER enables the assessment of isolated effects of discrete variables in single cells. Applying COSER to the Tabula Muris Senis single-cell transcriptome atlas, we characterized the isolated impact of tissue environments. Our findings demonstrate that some of genes are markedly affected by the tissue environment, particularly in modulating intercellular diversity in immune responses and their age-related changes. Conclusion: COSER provides a robust, general-purpose framework for evaluating the isolated effects of discrete variables from large-scale data mining. This approach reveals critical insights into the interplay between tissue environments and gene expression.
翻译:背景:理解全身细胞多样性对于阐明生物系统复杂功能至关重要。近期,大规模单细胞组学数据集(即组学图谱)已公开可用。这些图谱涵盖来自不同组织和细胞类型的数据,为细胞类型特异性基因表达景观提供了见解。然而,组织环境的独立效应尚未得到深入研究。由于体内组织和细胞类型组合存在显著偏倚,导致其与细胞类型效应产生统计学混杂,评估这种独立效应具有挑战性。结果:本研究提出了一种名为"混杂减少的组合子数据集提取"(COSER)的新型数据分析框架,该框架通过图论枚举适当子数据集来解决统计混杂问题。COSER能够评估单细胞中离散变量的独立效应。将COSER应用于Tabula Muris Senis单细胞转录组图谱,我们系统表征了组织环境的独立影响。研究结果表明,部分基因显著受组织环境调控,特别是在免疫应答的细胞间多样性调节及其年龄相关变化中。结论:COSER为从大规模数据挖掘中评估离散变量的独立效应提供了稳健的通用框架。该方法揭示了组织环境与基因表达相互作用的重要机制。