Modern high-throughput sequencing assays efficiently capture not only gene expression and different levels of gene regulation but also a multitude of genome variants. Focused analysis of alternative alleles of variable sites at homologous chromosomes of the human genome reveals allele-specific gene expression and allele-specific gene regulation by assessing allelic imbalance of read counts at individual sites. Here we formally describe an advanced statistical framework for detecting the allelic imbalance in allelic read counts at single-nucleotide variants detected in diverse omics studies (ChIP-Seq, ATAC-Seq, DNase-Seq, CAGE-Seq, and others). MIXALIME accounts for copy-number variants and aneuploidy, reference read mapping bias, and provides several scoring models to balance between sensitivity and specificity when scoring data with varying levels of experimental noise-caused overdispersion.
翻译:现代高通量测序分析不仅有效捕获基因表达和不同层次的基因调控,还能获取大量基因组变异。通过评估单个位点读数计数的等位基因不平衡,对人类基因组同源染色体上可变位点的替代等位基因进行聚焦分析,可揭示等位基因特异性表达和等位基因特异性调控。本文正式描述了一个先进的统计框架,用于检测各类组学研究(如ChIP-Seq、ATAC-Seq、DNase-Seq、CAGE-Seq等)中单核苷酸变异处等位基因读数的等位基因不平衡。MIXALIME考虑了拷贝数变异和非整倍性、参考读段定位偏差,并提供了多种评分模型,以在分析具有不同实验噪声引起的过度离散程度的数据时平衡敏感性与特异性。