The advancement of novel combinatorial CRISPR screening technologies enables the identification of synergistic gene combinations on a large scale. This is crucial for developing novel and effective combination therapies, but the combinatorial space makes exhaustive experimentation infeasible. We introduce NAIAD, an active learning framework that efficiently discovers optimal gene pairs capable of driving cells toward desired cellular phenotypes. NAIAD leverages single-gene perturbation effects and adaptive gene embeddings that scale with the training data size, mitigating overfitting in small-sample learning while capturing complex gene interactions as more data is collected. Evaluated on four CRISPR combinatorial perturbation datasets totaling over 350,000 genetic interactions, NAIAD, trained on small datasets, outperforms existing models by up to 40\% relative to the second-best. NAIAD's recommendation system prioritizes gene pairs with the maximum predicted effects, resulting in the highest marginal gain in each AI-experiment round and accelerating discovery with fewer CRISPR experimental iterations. Our NAIAD framework (https://github.com/NeptuneBio/NAIAD) improves the identification of novel, effective gene combinations, enabling more efficient CRISPR library design and offering promising applications in genomics research and therapeutic development.
翻译:新型组合CRISPR筛选技术的进步使得大规模识别协同基因组合成为可能。这对于开发新颖有效的联合疗法至关重要,但组合空间的庞大性使得穷举实验不可行。我们提出了NAIAD,一种主动学习框架,能够高效发现能够驱动细胞朝向目标细胞表型的最优基因对。NAIAD利用单基因扰动效应和随训练数据规模扩展的自适应基因嵌入,在小样本学习中缓解过拟合,同时随着收集更多数据捕捉复杂的基因相互作用。在总计超过35万个遗传相互作用的四个CRISPR组合扰动数据集上的评估表明,在小数据集上训练的NAIAD,其性能优于现有模型,相对于次优模型提升高达40%。NAIAD的推荐系统优先考虑具有最大预测效应的基因对,从而在每轮AI-实验循环中获得最高的边际收益,并以更少的CRISPR实验迭代加速发现进程。我们的NAIAD框架(https://github.com/NeptuneBio/NAIAD)改进了新颖有效基因组合的识别,实现了更高效的CRISPR文库设计,并为基因组学研究和治疗开发提供了有前景的应用。