We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host behaviours are not always correctly described by GEMs. Learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.
翻译:我们应用基于逻辑的机器学习技术,以促进细胞工程并推动生物学发现,其基础是称为基因组尺度代谢网络模型(GEMs)的综合代谢过程数据库。GEMs并不总能正确描述预测的宿主行为。学习GEMs内部复杂的遗传相互作用带来了计算和实证上的挑战。为解决这些问题,我们描述了一种称为布尔矩阵逻辑编程(BMLP)的新方法,它利用布尔矩阵来评估大型逻辑程序。我们引入了一个新系统$BMLP_{active}$,该系统通过主动学习指导信息性实验,从而高效探索基因组假设空间。与亚符号方法相比,$BMLP_{active}$使用datalog逻辑程序,以一种可解释的逻辑表示形式,编码了一个被广泛接受的细菌宿主的最先进GEM。值得注意的是,与随机实验相比,$BMLP_{active}$能够用更少的训练样本成功学习基因对之间的相互作用,从而克服了实验设计空间增大的问题。$BMLP_{active}$能够快速优化代谢模型,并为微生物工程的自驱动实验室提供了一种现实可行的方法。