Recent advances in drug discovery have demonstrated that incorporating side information (e.g., chemical properties about drugs and genomic information about diseases) often greatly improves prediction performance. However, these side features can vary widely in relevance and are often noisy and high-dimensional. We propose Bayesian Variable Selection-Guided Inductive Matrix Completion (BVSIMC), a new Bayesian model that enables variable selection from side features in drug discovery. By learning sparse latent embeddings, BVSIMC improves both predictive accuracy and interpretability. We validate our method through simulation studies and two drug discovery applications: 1) prediction of drug resistance in Mycobacterium tuberculosis, and 2) prediction of new drug-disease associations in computational drug repositioning. On both synthetic and real data, BVSIMC outperforms several other state-of-the-art methods in terms of prediction. In our two real examples, BVSIMC further reveals the most clinically meaningful side features.
翻译:近期药物发现领域的研究表明,纳入辅助信息(例如药物的化学性质与疾病的基因组信息)通常能大幅提升预测性能。然而,这些辅助特征在相关性上差异显著,且常伴有噪声和高维特性。我们提出贝叶斯变量选择引导的归纳矩阵补全(BVSIMC),这是一种新的贝叶斯模型,能够在药物发现中实现对辅助特征的变量选择。通过学习稀疏潜在嵌入,BVSIMC同时提升了预测准确性与可解释性。我们通过模拟研究及两项药物发现应用验证了本方法:1)结核分枝杆菌药物耐药性预测,以及2)计算药物重定位中新型药物-疾病关联预测。在合成数据与真实数据上,BVSIMC在预测性能方面均优于数种现有最先进方法。在两个实际案例中,BVSIMC进一步揭示了最具临床意义的辅助特征。