Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.
翻译:基因组扰动实验的高效设计对于加速药物发现和治疗靶点识别至关重要,但由于潜在遗传相互作用的庞大搜索空间及实验约束,对人类基因组进行穷举扰动仍不可行。贝叶斯优化已成为选择信息性干预措施的有力框架,但现有方法往往未能充分利用生物领域的特异性先验知识。我们提出基于生物学信息的贝叶斯优化方法BioBO,通过将贝叶斯优化与多模态基因嵌入及富集分析(生物学中广泛用于基因优先级排序的工具)相结合,增强代理建模与采集策略。BioBO以理论化的框架将生物学先验与采集函数相融合,在保持探索不确定区域能力的同时,引导搜索向有前景的基因倾斜。通过在公开标准基准与数据集上的实验,我们证明BioBO可将标注效率提升25-40%,并能更有效地识别高性能扰动方案,持续优于传统贝叶斯优化方法。此外,通过引入富集分析,BioBO为所选扰动提供通路层面的解释,赋予设计结果与生物学相关调控回路相关联的机制可解释性。