Background: Antimicrobial resistance (AMR) is a global health threat. While the WHO Global Antimicrobial Resistance and Use Surveillance System (GLASS) provides standardized data, population-level machine learning forecasting of resistance trends remains limited. Translating computational forecasts into policy requires transparent interpretation mechanisms. Methods: Surveillance data (2021-2023) comprising 5,909 observations across 44 countries and five WHO regions were processed. A rigorous temporal split prevented data leakage. Six models (Naive, Linear, Ridge, XGBoost, LightGBM, LSTM) were benchmarked to forecast one-year-ahead resistance rates using features including prior-year resistance and antibiotic consumption. Evaluation metrics (MAE, RMSE, sMAPE) were computed, with 95% bootstrap confidence intervals for MAE. A local Retrieval-Augmented Generation (RAG) system utilizing Gemma 4 was implemented to translate forecast findings into policy guidance grounded in retrieved WHO documents. Results: XGBoost achieved the best performance (test MAE = 6.13% [95% CI: 5.83-6.44]), an 85.3% error reduction versus the naive baseline (MAE = 41.79%). SHAP analysis identified prior-year resistance as the dominant predictor (50.5% gain), confirming strong autoregressive behavior. Regional forecast error tracked closely with surveillance coverage, ranging from 3.65% in the European Region to 8.61% in South-East Asia. The RAG pipeline generated accurate, source-attributed policy responses without fabricated citations. Conclusion: Short-term AMR resistance rates exhibit strong temporal autocorrelation that can be accurately forecasted using gradient boosting. Coupling these forecasts with a hallucination-resistant RAG system provides a scalable, evidence-based decision-support framework for AMR governance.
翻译:背景:抗菌药物耐药性(AMR)是全球性健康威胁。尽管世界卫生组织全球抗菌药物耐药性与使用监测系统(GLASS)提供了标准化数据,但基于人群水平的机器学习耐药趋势预测仍十分有限。将计算预测转化为政策需要透明的解释机制。方法:本研究处理了2021-2023年覆盖44个国家及五个世界卫生组织区域的5909条监测数据。采用严格的时间分割方法防止数据泄漏。基准测试了六种模型(朴素模型、线性回归、岭回归、XGBoost、LightGBM、LSTM),利用上年耐药率及抗生素使用量等特征预测未来一年的耐药率。计算评估指标(MAE、RMSE、sMAPE),并采用95%自助法置信区间评估MAE。通过基于Gemma 4的本地检索增强生成(RAG)系统,将预测结果转化为基于检索到的世界卫生组织文件的政策指导。结果:XGBoost模型取得最优性能(测试MAE = 6.13% [95% CI: 5.83-6.44]),与朴素基线模型(MAE = 41.79%)相比误差降低85.3%。SHAP分析表明上年耐药率是主导预测因子(贡献度50.5%),证实了强自回归特性。区域预测误差与监测覆盖紧密相关,范围从欧洲区域的3.65%到东南亚区域的8.61%。RAG系统生成了准确且可溯源的政策响应,未出现虚假引用。结论:短期AMR耐药率呈现显著的时间自相关性,可通过梯度提升方法实现准确预测。将该预测方法与抗幻觉的RAG系统相结合,为AMR治理提供了可扩展、基于证据的决策支持框架。