Supervised machine learning models and public surveillance data has been employed for infectious disease forecasting in many settings. These models leverage various data sources capturing drivers of disease spread, such as climate conditions or human behavior. However, few models have incorporated the organizational structure of different geographic locations for forecasting. Traveling waves of seasonal outbreaks have been reported for dengue, influenza, and other infectious diseases, and many of the drivers of infectious disease dynamics may be shared across different cities, either due to their geographic or socioeconomic proximity. In this study, we developed a machine learning model to predict case counts of four infectious diseases across Brazilian cities one week ahead by incorporating information from related cities. We compared selecting related cities using both geographic distance and GDP per capita. Incorporating information from geographically proximate cities improved predictive performance for two of the four diseases, specifically COVID-19 and Zika. We also discuss the impact on forecasts in the presence of anomalous contagion patterns and the limitations of the proposed methodology.
翻译:监督式机器学习模型与公共监测数据已在多种环境中用于传染病预测。这些模型利用气候条件或人类行为等驱动疾病传播的不同数据源。然而,很少有模型整合不同地理位置的组织结构进行预测。季节性疫情传播波已在登革热、流感和其它传染病中被报道,且许多传染病动态驱动因素可能因地理或社会经济邻近性在不同城市间共享。本研究开发了一种机器学习模型,通过纳入关联城市的信息,提前一周预测巴西城市中四种传染病的病例数。我们比较了利用地理距离和人均GDP筛选关联城市的方法。纳入地理邻近城市的信息改善了四种疾病中两种(即COVID-19和寨卡病毒)的预测性能。同时,我们讨论了异常传染模式对预测的影响以及所提出方法的局限性。