Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a Susceptible-Infected-Recovered model for slowly changing, noisy disease dynamics. Outbreak sequences give a transcritical bifurcation within a specified future time window, whereas non-outbreak (null bifurcation) sequences do not. We identified incipient differences in time series of infectives leading to future outbreaks and non-outbreaks. These differences are reflected in 22 statistical features and 5 early warning signal indicators. Classifier performance, given by the area under the receiver-operating curve, ranged from 0.99 for large expanding windows of training data to 0.7 for small rolling windows. Real-world performances of classifiers were tested on two empirical datasets, COVID-19 data from Singapore and SARS data from Hong Kong, with two classifiers exhibiting high accuracy. In summary, we showed that there are statistical features that distinguish outbreak and non-outbreak sequences long before outbreaks occur. We could detect these differences in synthetic and real-world data sets, well before potential outbreaks occur.
翻译:预测新型疾病暴发的发生与缺失对疾病管理至关重要。本文提出了一种无需真实训练数据的通用模型,可准确预测疾病暴发与非暴发。我们采用基于特征的时间序列分类方法构建新框架,用于预测疾病暴发与非暴发。基于易感-感染-恢复模型的缓慢变化噪声疾病动力学合成数据对方法进行测试:暴发序列在未来指定时间窗内呈现跨临界分岔,而非暴发(零分岔)序列则无此特征。我们识别了感染者时间序列中导致未来暴发与非暴发的早期差异,这些差异体现在22个统计特征和5个早期预警信号指标中。分类器性能通过受试者工作特征曲线下面积衡量,在扩展训练数据窗达0.99,在滑动小数据窗下降至0.7。基于新加坡COVID-19数据和香港SARS数据两个经验数据集的实际性能测试显示,两类分类器均表现出高准确度。综上,我们证实存在可区分暴发与非暴发序列的统计特征,且这些特征在暴发发生前即可被识别。在合成数据与实际数据集中,这些差异均可在潜在暴发发生前被有效检测。