Bursting cells lead to ambient RNA that contaminates sequencing data. This process is especially problematic in perturbation experiments where transcription factors are implanted into cells to determine their effects. The presence of contaminants makes it difficult to determine whether a factor is truly expressed in the cell. This paper studies the properties of contaminant noise from an analytical perspective, showing that the cell bursting process constrains the form of the noise distribution across factors. These constraints can be leveraged to improve decontamination by removing counts that are more likely the result of noise than expression. In two biological replicates of a perturbation experiment, run across two sequencing protocols, decontaminated counts agree with bulk genomic measurements of the transduction rate and are automatically corrected for differences in sequencing.
翻译:细胞破裂会导致环境RNA污染测序数据。这一问题在将转录因子导入细胞以测定其效应的扰动实验中尤为突出。污染物的存在使得判断因子是否真实表达变得困难。本文从分析角度研究污染物噪声的特性,表明细胞破裂过程限制了各因子的噪声分布形式。利用这些约束条件,可通过去除更可能来自噪声而非表达的计数,来改进去污染效果。在两个采用不同测序方案的扰动实验生物学重复中,去污染后的计数与基因组水平的整体转导率测量结果一致,并自动校正了测序差异。