In a range of genomic applications, it is of interest to quantify the evidence that the signal at site~$i$ is active given conditionally independent replicate observations summarized by the sample mean and variance $(\bar Y, s^2)$ at each site. We study the version of the problem in which the signal distribution is sparse, and the error distribution has an unknown site-specific variance so that the null distribution of the standardized statistic is Student-$t$ rather than Gaussian. The main contribution of this paper is a sparse-mixture approximation to the non-null density of the $t$-ratio. This formula demonstrates the effect of low degrees of freedom on the Bayes factor, or the conditional probability that the site is active. We illustrate some differences on a HIV dataset for gene-expression data previously analyzed by Efron (2012).
翻译:在多种基因组学应用中,需要通过条件独立重复观测数据的样本均值与方差$(\bar Y, s^2)$来量化位点$i$信号活跃的证据。本文研究该问题的变体:信号分布呈稀疏性,误差分布具有未知的位点特异性方差,因此标准化统计量的原分布为Student-$t$分布而非高斯分布。本文的主要贡献在于提出t比值的非原密度稀疏混合近似公式,该公式揭示了低自由度对贝叶斯因子(即位点活跃的条件概率)的影响。我们通过Efron(2012)先前分析的HIV数据集基因表达数据,展示了两者间的部分差异。