Accurate forecasting of infectious disease incidence is critical for public health planning and timely intervention. While most data-driven forecasting approaches rely primarily on historical data from a single country, such data are often limited in length and variability, restricting the performance of machine learning (ML) models. In this work, we investigate a cross-country learning approach for infectious disease forecasting, in which a single model is trained on time series data from multiple countries and evaluated on a country of interest. This setting enables the model to exploit shared epidemic dynamics across countries and to benefit from an enlarged training set. We examine this approach through a case study on COVID-19 case forecasting in Cyprus, using surveillance data from European countries. We evaluate multiple ML models and analyse the impact of the lookback window length and cross-country `data augmentation' on multi-step forecasting performance. Our results show that incorporating data from other countries can lead to consistent improvements over models trained solely on national data. Although the empirical focus is on Cyprus and COVID-19, the proposed framework and findings are applicable to infectious disease forecasting more broadly, particularly in settings with limited national historical data.
翻译:传染病发病率的准确预测对于公共卫生规划和及时干预至关重要。虽然大多数数据驱动的预测方法主要依赖于单一国家的历史数据,但此类数据通常在长度和变异性方面有限,从而限制了机器学习(ML)模型的性能。在本研究中,我们探讨了一种用于传染病预测的跨国学习方法,即利用来自多个国家的时间序列数据训练单一模型,并在目标国家进行评估。这种设置使模型能够利用各国间共享的流行病动态,并从扩大的训练集中受益。我们通过对塞浦路斯COVID-19病例预测的案例研究来检验该方法,使用的监测数据来自欧洲国家。我们评估了多种ML模型,并分析了回顾窗口长度和跨国“数据增强”对多步预测性能的影响。我们的结果表明,纳入其他国家的数据可以持续改进仅使用本国数据训练的模型。尽管实证重点在于塞浦路斯和COVID-19,但所提出的框架和发现更广泛地适用于传染病预测,特别是在国家历史数据有限的情况下。