Epidemic forecasting has become an integral part of real-time infectious disease outbreak response. While collaborative ensembles composed of statistical and machine learning models have become the norm for real-time forecasting, standardized benchmark datasets for evaluating such methods are lacking. Further, there is limited understanding on performance of these methods for novel outbreaks with limited historical data. In this paper, we propose IDOBE, a curated collection of epidemiological time series focused on outbreak forecasting. IDOBE compiles from multiple data repositories spanning over a century of surveillance and across U.S. states and global locations. We perform derivative-based segmentation to generate over 10,000 outbreaks covering multiple outcomes such as cases and hospitalizations for 13 diseases. We consider a variety of information-theoretic and distributional measures to quantify the epidemiological diversity of the dataset. Finally, we perform multi-horizon short-term forecasting (1- to 4-week-ahead) through the progression of the outbreak using 11 baseline models and report on their performance. In addition to standard metrics such as NMSE and MAPE for point forecasts, we include probabilistic scoring rules such as Normalized Weighted Interval Score (NWIS) to quantify the performance. We find that MLP-based methods have the most robust performance, with statistical methods having a slight edge during the pre-peak phase. IDOBE dataset along with baselines are released publicly on https://github.com/NSSAC/IDOBE to enable standardized, reproducible benchmarking of outbreak forecasting methods.
翻译:流行病预测已成为实时传染病暴发响应的重要组成部分。尽管由统计模型与机器学习模型构成的协作集成方法已成为实时预测的常态,但用于评估此类方法的标准化基准数据集仍存在缺失。此外,对于缺乏历史数据的新型暴发,这些方法的性能表现尚缺乏充分认知。本文提出IDOBE——一个聚焦暴发预测的精选流行病时间序列数据集。该数据集整合了多个数据存储库,覆盖美国各州及全球地区逾一个世纪的监测数据。我们通过基于导数的分割方法生成了超过10,000个暴发事件,涵盖13种疾病的病例数、住院人数等多类结局指标。采用多种信息论与分布度量指标量化数据集的流行病学多样性。最终,利用11种基线模型开展暴发进程中的多时间跨度短期预测(未来1至4周),并报告其性能。除点预测标准指标(如NMSE、MAPE)外,还纳入概率评分准则(如归一化加权区间评分NWIS)进行效能量化。研究发现,基于MLP的方法综合表现最优,而统计方法在暴发前峰值阶段略具优势。IDOBE数据集及基线模型已在https://github.com/NSSAC/IDOBE 公开,以推动暴发预测方法的标准化与可复现基准评估。