Human genetic diseases often arise from point mutations, emphasizing the critical need for precise genome editing techniques. Among these, base editing stands out as it allows targeted alterations at the single nucleotide level. However, its clinical application is hindered by low editing efficiency and unintended mutations, necessitating extensive trial-and-error experimentation in the laboratory. To speed up this process, we present an attention-based two-stage machine learning model that learns to predict the likelihood of all possible editing outcomes for a given genomic target sequence. We further propose a multi-task learning schema to jointly learn multiple base editors (i.e. variants) at once. Our model's predictions consistently demonstrated a strong correlation with the actual experimental results on multiple datasets and base editor variants. These results provide further validation for the models' capacity to enhance and accelerate the process of refining base editing designs.
翻译:人类遗传疾病常由点突变引起,这凸显了对精准基因组编辑技术的迫切需求。其中,碱基编辑技术因其能够在单核苷酸水平实现靶向修饰而脱颖而出。然而,其临床应用受到编辑效率低下和非预期突变的制约,需要在实验室中进行大量的试错实验。为加速这一过程,我们提出了一种基于注意力的两阶段机器学习模型,该模型能够学习预测给定基因组靶序列所有可能编辑结果的发生概率。我们进一步提出了一种多任务学习方案,以同时联合学习多种碱基编辑器(即变体)。我们的模型预测结果与多个数据集及碱基编辑器变体上的实际实验结果始终保持强相关性。这些结果为模型增强和加速碱基编辑设计优化过程的能力提供了进一步验证。