Standard ChIP-seq peak calling pipelines seek to differentiate biochemically reproducible signals of individual genomic elements from background noise. However, reproducibility alone does not imply functional regulation (e.g., enhancer activation, alternative splicing). Here we present a general-purpose, interpretable machine learning method: signed iterative random forests (siRF), which we use to infer regulatory interactions among transcription factors and functional binding signatures surrounding enhancer elements in Drosophila melanogaster.
翻译:标准ChIP-seq峰识别流程旨在区分单个基因组元件的生化可重复信号与背景噪声。然而,可重复性本身并不代表功能调控(例如增强子激活、可变剪接)。本文提出了一种通用且可解释的机器学习方法——符号迭代随机森林(siRF),并利用该方法推断黑腹果蝇中转录因子之间的调控相互作用以及增强子元件周围的功能性结合特征。