Sequencing technologies have revolutionised the field of molecular biology. We now have the ability to routinely capture the complete RNA profile in tissue samples. This wealth of data allows for comparative analyses of RNA levels at different times, shedding light on the dynamics of developmental processes, and under different environmental responses, providing insights into gene expression regulation and stress responses. However, given the inherent variability of the data stemming from biological and technological sources, quantifying changes in gene expression proves to be a statistical challenge. Here, we present a closed-form Bayesian solution to this problem. Our approach is tailored to the differential gene expression analysis of processed RNA-Seq data. The framework unifies and streamlines an otherwise complex analysis, typically involving parameter estimations and multiple statistical tests, into a concise mathematical equation for the calculation of Bayes factors. Using conjugate priors we can solve the equations analytically. For each gene, we calculate a Bayes factor, which can be used for ranking genes according to the statistical evidence for the gene's expression change given RNA-Seq data. The presented closed-form solution is derived under minimal assumptions and may be applied to a variety of other 2-sample problems.
翻译:测序技术彻底改变了分子生物学领域。我们现在能够常规捕获组织样本中的完整RNA谱。这些丰富的数据允许对不同时间点的RNA水平进行比较分析,从而揭示发育过程的动态变化;同时也能对环境响应下的RNA水平进行比较,为基因表达调控和应激反应提供见解。然而,考虑到数据固有的生物性和技术性变异,量化基因表达变化被证明是一个统计学难题。本文针对该问题提出了一种闭式贝叶斯解决方案。我们的方法专为处理后的RNA-Seq数据的差异基因表达分析而设计。该框架将原本复杂的分析——通常涉及参数估计和多重统计检验——统一并简化为一个用于计算贝叶斯因子的简洁数学方程。通过使用共轭先验,我们可以解析求解这些方程。对于每个基因,我们计算一个贝叶斯因子,该因子可用于根据RNA-Seq数据中基因表达变化的统计证据对基因进行排序。所提出的闭式解在最小假设条件下推导得出,可适用于多种其他2样本问题。