The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanisms. Existing approaches search over the billions of potential interventions to maximize the expected influence on the target phenotype. However, to reduce the risk of failure in future stages of trials, practical experiment design aims to find a set of interventions that maximally change a target phenotype via diverse mechanisms. We propose DiscoBAX, a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign. We provide theoretical guarantees of approximate optimality under standard assumptions, and conduct a comprehensive experimental evaluation covering both synthetic as well as real-world experimental design tasks. DiscoBAX outperforms existing state-of-the-art methods for experimental design, selecting effective and diverse perturbations in biological systems.
翻译:治疗基因驱动疾病的药物发现依赖于识别潜在疾病机制所涉及的基因。现有方法在数十亿种潜在干预中搜索,以最大化对目标表型的预期影响。然而,为降低后续试验阶段的失败风险,实际实验设计旨在通过多样化的机制找到一组最大程度改变目标表型的干预措施。我们提出DiscoBAX,一种样本高效的方法,能在基因组实验过程中最大化每次实验的重要发现率,同时探测广泛的多样化机制。我们在标准假设下提供了近似最优性的理论保证,并开展了涵盖合成及真实世界实验设计任务的全面实验评估。DiscoBAX优于现有最先进的实验设计方法,能够在生物系统中选择有效且多样化的扰动。