Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open-source drug discovery framework, through the integration of DeepVariant. In particular, we introduce a variant calling pipeline that leverages DeepVariant's convolutional neural network (CNN) architecture to improve the accuracy and reliability of variant detection. The implemented pipeline includes stages for realignment of sequencing reads, candidate variant detection, and pileup image generation, followed by variant classification using a modified Inception v3 model. Our work adds a modular and extensible variant calling framework to the DeepChem framework and enables future work integrating DeepChem's drug discovery infrastructure more tightly with bioinformatics pipelines.
翻译:变异检测是基因组研究中的一项基础任务,对于检测单核苷酸多态性(SNP)以及插入或缺失(indel)等遗传变异至关重要。本文通过整合DeepVariant,对广泛使用的开源药物发现框架DeepChem进行了功能增强。具体而言,我们引入了一种变异检测流程,该流程利用DeepVariant的卷积神经网络(CNN)架构来提高变异检测的准确性和可靠性。所实现的流程包括测序读段的重比对、候选变异检测、堆积图像生成等阶段,随后使用改进的Inception v3模型进行变异分类。我们的工作为DeepChem框架增加了一个模块化且可扩展的变异检测框架,并为未来将DeepChem的药物发现基础设施与生物信息学流程更紧密地集成奠定了基础。