Interpreting gene clusters from RNA-seq remains challenging, especially in antimicrobial resistance studies where mechanistic context is essential for hypothesis generation. Conventional enrichment methods summarize co-expressed modules using predefined categories, but often return sparse results and lack cluster-specific, literature-linked explanations. We present BIOGEN, an evidence-grounded multi-agent framework for post hoc interpretation of RNA-seq transcriptional modules that integrates biomedical retrieval, structured reasoning, and multi-critic verification. BIOGEN organizes evidence from PubMed and UniProt into traceable cluster-level interpretations with explicit support and confidence tiering. On a primary Salmonella enterica dataset, BIOGEN achieved strong evidence-grounding performance while reducing hallucination from 0.67 in an unconstrained LLM setting to 0.00 under retrieval-grounded configurations. Compared with KEGG/ORA and GO/ORA, BIOGEN recovered broader biological coverage, identifying substantially more biological themes per cluster. Across four additional bacterial RNA-seq datasets, BIOGEN maintained zero hallucination and consistently outperformed KEGG/ORA in cluster-level thematic coverage. These results position BIOGEN as an interpretive support framework that complements transcriptomic workflows through improved traceability, evidential transparency, and biological coverage.
翻译:解读RNA-seq中的基因簇仍然具有挑战性,尤其是在抗菌药物耐药性研究中,机制背景对于假设生成至关重要。传统的富集方法使用预定义类别总结共表达模块,但通常返回稀疏的结果,且缺乏基因簇特异性的、基于文献的解释。我们提出了BIOGEN,一个基于证据的多智能体框架,用于RNA-seq转录模块的事后解释,该框架整合了生物医学检索、结构化推理和多评论者验证。BIOGEN将来自PubMed和UniProt的证据组织成可追溯的基因簇级别解释,并附有明确的支持证据和置信度分级。在沙门氏菌主要数据集上,BIOGEN实现了强大的证据基础性能,同时将幻觉率从无约束大语言模型设置下的0.67降低到基于检索配置下的0.00。与KEGG/ORA和GO/ORA相比,BIOGEN恢复了更广泛的生物学覆盖范围,每个基因簇识别出的生物学主题显著增多。在四个额外的细菌RNA-seq数据集中,BIOGEN保持了零幻觉,并且在基因簇级别主题覆盖方面始终优于KEGG/ORA。这些结果将BIOGEN定位为一个解释性支持框架,通过改进的可追溯性、证据透明性和生物学覆盖来补充转录组学工作流。