Planning effective interventions in biological systems requires treatment-effect models that adapt to unseen biological contexts by identifying their specific underlying mechanisms. Yet single-cell perturbation datasets span only a handful of biological contexts, and existing methods cannot leverage new interventional evidence at inference time to adapt beyond their training data. To meta-learn a perturbation effect estimator, we present MapPFN, a prior-data fitted network (PFN) pre-trained on synthetic data generated from a prior over causal perturbations. Given a set of experiments, MapPFN uses in-context learning to predict post-perturbation distributions. Pre-trained on in silico gene knockouts alone, MapPFN identifies differentially expressed genes on par with models trained on real single-cell data. Fine-tuned, it consistently outperforms baselines across downstream datasets. Our code, model and data are available at https://marvinsxtr.github.io/MapPFN.
翻译:在生物系统中规划有效干预需要能够适应未见生物学背景的治疗效果模型,通过识别其特定的潜在机制来实现。然而,单细胞扰动数据集仅涵盖少数生物学背景,现有方法无法在推理时利用新的干预证据来超越其训练数据。为元学习扰动效应估计器,我们提出MapPFN——一种基于因果扰动先验的合成数据预训练的先验数据拟合网络(PFN)。给定一组实验,MapPFN利用上下文学习预测扰动后的分布。仅通过计算机模拟的基因敲除数据进行预训练,MapPFN识别差异表达基因的能力与基于真实单细胞数据训练的模型相当。经过微调后,它在多个下游数据集上持续优于基线方法。我们的代码、模型和数据可在https://marvinsxtr.github.io/MapPFN获取。