Transcription factors (TFs) regulate gene expression through complex and co-operative mechanisms. While many TFs act together, the logic underlying TFs binding and their interactions is not fully understood yet. Most current approaches for TF binding site prediction focus on individual TFs and binary classification tasks, without a full analysis of the possible interactions among various TFs. In this paper we investigate DNA TF binding site recognition as a multi-label classification problem, achieving reliable predictions for multiple TFs on DNA sequences retrieved in public repositories. Our deep learning models are based on Temporal Convolutional Networks (TCNs), which are able to predict multiple TF binding profiles, capturing correlations among TFs andtheir cooperative regulatory mechanisms. Our results suggest that multi-label learning leading to reliable predictive performances can reveal biologically meaningful motifs and co-binding patterns consistent with known TF interactions, while also suggesting novel relationships and cooperation among TFs.
翻译:转录因子通过复杂且协同的机制调控基因表达。尽管许多转录因子共同发挥作用,但其结合与相互作用的逻辑机制尚未被完全阐明。当前大多数转录因子结合位点预测方法聚焦于单个转录因子及二分类任务,未能全面分析不同转录因子间可能存在的相互作用。本文将DNA转录因子结合位点识别构建为多标签分类问题,对公共数据库中获取的DNA序列实现了多转录因子的可靠预测。我们的深度学习模型基于时序卷积网络,能够预测多个转录因子的结合谱,并捕获转录因子间的相关性及其协同调控机制。研究结果表明,多标签学习在实现可靠预测性能的同时,能够揭示具有生物学意义的基序与共结合模式,这些发现既与已知转录因子相互作用一致,也提示了转录因子间可能存在的新型关联与协同机制。