Predicting transcriptional responses to genetic perturbations is a central problem in functional genomics. In practice, perturbation responses are rarely gene-independent but instead manifest as coordinated, program-level transcriptional changes among functionally related genes. However, most existing methods do not explicitly model such coordination, due to gene-wise modeling paradigms and reliance on static biological priors that cannot capture dynamic program reorganization. To address these limitations, we propose scBIG, a module-inductive perturbation prediction framework that explicitly models coordinated gene programs. scBIG induces coherent gene programs from data via Gene-Relation Clustering, captures inter-program interactions through a Gene-Cluster-Aware Encoder, and preserves modular coordination using structure-aware alignment objectives. These structured representations are then modeled using conditional flow matching to enable flexible and generalizable perturbation prediction. Extensive experiments on multiple single-cell perturbation benchmarks show that scBIG consistently outperforms state-of-the-art methods, particularly on unseen and combinatorial perturbation settings, achieving an average improvement of 6.7% over the strongest baselines. The code is available at https://github.com/ttruan2426-dot/scBIG.
翻译:预测基因扰动引发的转录响应是功能基因组学中的一个核心问题。实际中,扰动响应很少表现为基因独立的作用,而是以协调的程序级转录变化形式,在功能相关的基因间显现。然而,由于现有方法大多基于逐基因建模范式,并依赖无法捕捉动态程序重组的静态生物学先验知识,因此未能显式建模这种协调性。为克服这些局限,我们提出scBIG——一种模块归纳式扰动预测框架,可显式建模协调的基因程序。scBIG通过基因关系聚类从数据中归纳出连贯的基因程序,借助基因聚类感知编码器捕获程序间相互作用,并利用结构感知对齐目标保持模块的协调性。随后,通过条件流匹配对这些结构化表示进行建模,以实现灵活且泛化能力强的扰动预测。在多个单细胞扰动基准数据集上的大量实验表明,scBIG持续优于现有先进方法,尤其是在未见过的扰动及组合扰动场景下,相较于最强基线平均性能提升达6.7%。代码已开源:https://github.com/ttruan2426-dot/scBIG。