Inferring the activities of transcription factors from high-throughput transcriptomic or open chromatin profiling, such as RNA-/CAGE-/ATAC-Seq, is a long-standing challenge in systems biology. Identification of highly active master regulators enables mechanistic interpretation of differential gene expression, chromatin state changes, or perturbation responses across conditions, cell types, and diseases. Here, we describe MARADONER, a statistical framework and its software implementation for motif activity response analysis (MARA), utilizing the sequence-level features obtained with pattern matching (motif scanning) of individual promoters and promoter- or gene-level activity or expression estimates. Compared to the classic MARA, MARADONER (MARA-done-right) employs an unbiased variance parameter estimation and a bias-adjusted likelihood estimation of fixed effects, thereby enhancing goodness-of-fit and the accuracy of activity estimation. Further, MARADONER is capable of accounting for heteroscedasticity of motif scores and activity estimates.
翻译:从高通量转录组或开放染色质分析(如RNA-/CAGE-/ATAC-Seq)数据中推断转录因子的活性,是系统生物学中长期存在的挑战。识别高度活跃的主调控因子,能够对跨条件、细胞类型和疾病的差异基因表达、染色质状态变化或扰动响应进行机制性解释。本文介绍MARADONER,这是一个用于基序活性响应分析(MARA)的统计框架及其软件实现,它利用通过对单个启动子进行模式匹配(基序扫描)获得的序列水平特征,以及启动子或基因水平的活性或表达估计。与经典的MARA相比,MARADONER(MARA-done-right)采用了无偏的方差参数估计和固定效应的偏差校正似然估计,从而提高了拟合优度和活性估计的准确性。此外,MARADONER能够处理基序得分和活性估计的异方差性。