Predicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, they are often overwhelmed by the entangled nature of high-dimensional perturbation results. Moreover, recent works have primarily focused on genetic perturbations in single-cell experiments, leaving bulk-cell chemical perturbations, which is central to drug discovery, largely unexplored. Motivated by this, we present LINCSQA, a novel benchmark for predicting target gene regulation under complex chemical perturbations in bulk-cell environments. We further propose PBio-Agent, a multi-agent framework that integrates difficulty-aware task sequencing with iterative knowledge refinement. Our key insight is that genes affected by the same perturbation share causal structure, allowing confidently predicted genes to contextualize more challenging cases. The framework employs specialized agents enriched with biological knowledge graphs, while a synthesis agent integrates outputs and specialized judges ensure logical coherence. PBio-Agent outperforms existing baselines on both LINCSQA and PerturbQA, enabling even smaller models to predict and explain complex biological processes without additional training.
翻译:预测基因调控对生物扰动的响应需要推理潜在的生物学因果关系。尽管大型语言模型(LLMs)在此类任务中展现出潜力,但它们常被高维扰动结果中纠缠复杂的特性所困扰。此外,近期研究主要集中于单细胞实验中的遗传扰动,而对药物发现核心的批量细胞化学扰动则鲜有探索。受此启发,我们提出了LINCSQA,一个用于预测批量细胞环境下复杂化学扰动下靶基因调控的新型基准。我们进一步提出了PBio-Agent,一个多智能体框架,该框架将难度感知的任务排序与迭代知识精炼相结合。我们的核心洞见是:受同一扰动影响的基因共享因果结构,这使得被高置信度预测的基因能够为更具挑战性的案例提供上下文。该框架采用富含生物知识图谱的专用智能体,同时通过一个合成智能体整合输出,并由专用评判器确保逻辑一致性。PBio-Agent在LINCSQA和PerturbQA基准上均优于现有基线,使得即使较小的模型也能够在无需额外训练的情况下预测并解释复杂的生物过程。