Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we present an autonomous system using Large Language Model (LLM)-guided tree search to iteratively generate, evaluate, and optimize executable forecasting software. In a fully prospective, real-time evaluation during the 2025-2026 US respiratory season, the system autonomously discovered methodologically diverse models for influenza, COVID-19, and respiratory syncytial virus (RSV). Aggregating these machine-generated models yielded an ensemble that consistently matched or outperformed the gold-standard, human-curated Centers for Disease Control and Prevention (CDC) hub ensembles out-of-sample. The system successfully navigated data-scarce "cold start" scenarios for RSV. Moreover, controlled retrospective ablations revealed that optimizing log-scale distance metrics prevents reward hacking, while an automated judge-in-the-loop ensures structural fidelity to complex scientific theories. By autonomously translating epidemiological theory into accurate, transparent code, this framework overcomes the modeling labor bottleneck, enabling rapid deployment of expert-level disease forecasting at unprecedented scales.
翻译:传染病概率预测对公共卫生至关重要,但当前依赖专家建模团队劳动密集型的人工模型配置。这种定制化开发限制了向精细地理区域或新兴病原体扩展的可扩展性。本文提出一种自主系统,利用大语言模型引导的树搜索迭代生成、评估并优化可执行的预测软件。在2025-2026年美国呼吸道流行季的完全前瞻性实时评估中,该系统自主发现针对流感、COVID-19及呼吸道合胞病毒的方法学多样化模型。整合这些机器生成模型形成的集成模型,在样本外预测中持续达到或超过黄金标准——即美国疾病控制与预防中心人工配置的枢纽集成模型。该系统成功应对了RSV数据稀疏的“冷启动”场景。此外,受控回顾性消融实验表明,优化对数尺度距离指标可防止奖励黑客行为,而自动化裁判循环机制确保了复杂科学理论的结构保真度。通过将流行病学理论自主转化为准确透明的代码,该框架克服了建模人力瓶颈,以前所未有的规模实现专家级疾病预测的快速部署。