Not All Accuracy Is Equal: Prioritizing Independence in Infectious Disease Forecasting

Ensemble forecasts have become a cornerstone of large-scale disease response, underpinning decision making at agencies such as the US Centers for Disease Control and Prevention (CDC). Their growing use reflects the goal of combining multiple models to improve accuracy and stability versus relying on any single model. However, while ensembles regularly demonstrate stability against individual model failures, improved accuracy is not guaranteed. During the COVID-19 pandemic, the CDC's multi-model ensemble outperformed the best single model by only 1\%, and CDC flu ensembles have often ranked below individual models. Prior work has established that ensemble performance depends critically on diversity: when models make independent errors, combining them yields substantial gains. In practice, however, this diversity is often lacking. Here, we propose that this is due in part to how models are developed and selected: both modelers and ensemble builders optimize for stand-alone accuracy rather than ensemble contribution, and most epidemic forecasts are built from a small set of approaches trained on the same surveillance data. The result is highly correlated errors, limiting the benefit of ensembling. This suggests that in developing models and ensembles, we should prioritize models that contribute complementary information rather than replicating existing approaches. We present a toy example illustrating the theoretical cost of correlated errors, analyze correlations among COVID-19 forecasting models, and propose improvements to model fitting and ensemble construction that foster genuine diversity. Ensembles built with this principle in mind produce forecasts that are more robust and more valuable for epidemic preparedness and response.

翻译：集成预测已成为大规模疾病应对的基石，为美国疾病控制与预防中心（CDC）等机构的决策提供支撑。其日益广泛的应用反映了通过组合多个模型以提高准确性和稳定性、而非依赖单一模型的目标。然而，尽管集成模型通常能展现出抵御个体模型失效的稳定性，但准确性的提升却无法得到保证。在COVID-19大流行期间，CDC的多模型集成仅比最佳单一模型表现优异1%，而CDC的流感集成模型排名常低于个体模型。已有研究证实，集成性能关键取决于多样性：当模型产生独立误差时，组合它们能带来显著增益。然而在实践中，这种多样性往往匮乏。本文提出，部分原因在于模型的开发与选择方式：模型开发者和集成构建者都倾向于优化独立准确性而非集成贡献度，且大多数流行病预测模型基于同一监测数据训练的小规模方法集构建。这导致误差高度相关，限制了集成效益。这表明在开发模型和构建集成时，我们应优先考虑能提供互补信息而非重复现有方法的模型。我们通过一个理论示例说明相关误差的理论代价，分析COVID-19预测模型间的相关性，并提出能促进真实多样性的模型拟合与集成构建改进方案。遵循此原则构建的集成预测能为流行病防范与应对提供更稳健、更具价值的预测结果。