High-throughput preclinical perturbation screens, where the effects of genetic, chemical, or environmental perturbations are systematically tested on disease models, hold significant promise for machine learning-enhanced drug discovery due to their scale and causal nature. Predictive models trained on such datasets can be used to (i) infer perturbation response for previously untested disease models, and (ii) characterise the biological context that affects perturbation response. Existing predictive models suffer from limited reproducibility, generalisability and interpretability. To address these issues, we introduce a framework of Layered Ensemble of Autoencoders and Predictors (LEAP), a general and flexible ensemble strategy to aggregate predictions from multiple regressors trained using diverse gene expression representation models. LEAP consistently improves prediction performances in unscreened cell lines across modelling strategies. In particular, LEAP applied to perturbation-specific LASSO regressors (PS-LASSO) provides a favorable balance between near state-of-the-art performance and low computation time. We also propose an interpretability approach combining model distillation and stability selection to identify important biological pathways for perturbation response prediction in LEAP. Our models have the potential to accelerate the drug discovery pipeline by guiding the prioritisation of preclinical experiments and providing insights into the biological mechanisms involved in perturbation response. The code and datasets used in this work are publicly available.
翻译:高通量临床前扰动筛选——在疾病模型上系统测试遗传、化学或环境扰动效应——因其规模化和因果性特点,为机器学习增强的药物发现提供了重要前景。基于此类数据集训练的预测模型可用于:(i)推断未测试疾病模型的扰动反应;(ii)表征影响扰动反应的生物学背景。现有预测模型存在可重复性、泛化性和可解释性不足的问题。为解决这些局限,我们提出了分层自编码器与预测器集成(LEAP)框架,这是一种通用且灵活的集成策略,可聚合来自多种基因表达表征模型训练所得回归器的预测结果。在不同建模策略下,LEAP能持续提升未筛选细胞系的预测性能。特别地,将LEAP应用于扰动特异性LASSO回归器(PS-LASSO)时,可在接近最优性能与较低计算时间之间实现良好平衡。我们还提出了一种结合模型蒸馏与稳定性选择的可解释性方法,用于识别LEAP中影响扰动反应预测的关键生物学通路。本研究所构建模型有望通过指导临床前实验的优先级排序,并提供对扰动反应相关生物学机制的深入理解,从而加速药物研发流程。本研究使用的代码与数据集均已公开。