Chromatin immunoprecipitation, followed by high throughput sequencing provides vital insights into locations on the genome with differential DNA occupancy between experimental states. However, since ChIP-Seq data is collected experimentally, it must be normalized between samples in order to properly assess which genomic regions have differential DNA occupancy via differential binding analysis. While between-sample normalization is a crucial for downstream differential binding analysis, the technical conditions underlying between-sample ChIP-Seq normalization methods have yet to be specifically examined. We identify three important technical conditions underlying ChIP-Seq between-sample normalization methods: symmetric differential DNA occupancy, equal total DNA occupancy, and equal background binding across experimental states. We categorize popular ChIP-Seq normalization methods based on their technical conditions and simulate ChIP-Seq read count data to exemplify the importance of satisfying a normalization method's technical conditions to downstream differential binding analysis. We assess the similarity between normalization methods in experimental CUT&RUN data to externally verify our simulation findings. Our simulation and experimental results underscore that satisfying the technical conditions underlying the selected between-sample normalization methods is crucial to conducting biologically meaningful downstream differential binding analysis. We suggest that researchers use their understanding of the ChIP-Seq experiment at hand to guide their choice of between-sample normalization method when possible. Researchers could use the intersection of the differentially bound peaksets derived from different normalization methods to determine which regions have differential DNA occupancy between experimental states when there is uncertainty about which technical conditions are met.
翻译:染色质免疫沉淀结合高通量测序技术为揭示不同实验状态下基因组DNA占据位点的差异提供了关键信息。然而,由于ChIP-Seq数据通过实验采集,样本间必须进行标准化处理才能通过差异结合分析准确评估基因组区域的DNA占据差异。虽然样本间标准化对下游差异结合分析至关重要,但现有ChIP-Seq样本间标准化方法所依赖的技术条件尚未得到系统检验。本文识别了ChIP-Seq样本间标准化方法的三个重要技术条件:对称性DNA占据差异、等量总DNA占据以及跨实验状态的等背景结合。我们依据技术条件对常用ChIP-Seq标准化方法进行分类,并通过模拟ChIP-Seq读数数据阐明满足标准化方法技术条件对下游差异结合分析的重要性。在实验性CUT&RUN数据中评估不同标准化方法的结果相似性,以外部验证模拟研究结论。模拟与实验结果共同表明:满足所选样本间标准化方法的技术条件对开展具有生物学意义的下游差异结合分析至关重要。建议研究人员尽可能依据对具体ChIP-Seq实验的理解来指导样本间标准化方法的选择。当技术条件的满足情况存在不确定性时,可通过整合不同标准化方法所得差异结合峰集的交集区域来确定实验状态间存在DNA占据差异的基因组区域。