Modern high-throughput sequencing assays efficiently capture not only gene expression and different levels of gene regulation but also a multitude of genome variants. Focused analysis of alternative alleles of variable sites at homologous chromosomes of the human genome reveals allele-specific gene expression and allele-specific gene regulation by assessing allelic imbalance of read counts at individual sites. Here we formally describe an advanced statistical framework for detecting the allelic imbalance in allelic read counts at single-nucleotide variants detected in diverse omics studies (ChIP-Seq, ATAC-Seq, DNase-Seq, CAGE-Seq, and others). MIXALIME accounts for copy-number variants and aneuploidy, reference read mapping bias, and provides several scoring models to balance between sensitivity and specificity when scoring data with varying levels of experimental noise-caused overdispersion.
翻译:现代高通量测序技术不仅能高效捕获基因表达及不同层次的基因调控信息,还能检测大量基因组变异。通过分析人类基因组同源染色体上变异位点的替代等位基因,并评估单个位点上等位基因读段计数的不平衡性,可揭示等位基因特异性表达与等位基因特异性调控机制。本文系统阐述了一种先进的统计框架,用于检测多种组学研究(ChIP-Seq、ATAC-Seq、DNase-Seq、CAGE-Seq等)中单核苷酸变异位点的等位基因读段不平衡。MIXALIME模型考虑了拷贝数变异、非整倍体及参考序列读段映射偏差,并提供多种评分模型,可在不同实验噪声引起的过度分散水平下,平衡数据评分的敏感性与特异性。