Modeling epidemic spread is critical for informing policy decisions aimed at mitigation. Accordingly, in this work we present a new data-driven method based on Gaussian process regression (GPR) to model epidemic spread through the difference on the logarithmic scale of the infected cases. We bound the variance of the predictions made by GPR, which quantifies the impact of epidemic data on the proposed model. Next, we derive a high-probability error bound on the prediction error in terms of the distance between the training points and a testing point, the posterior variance, and the level of change in the spreading process, and we assess how the characteristics of the epidemic spread and infection data influence this error bound. We present examples that use GPR to model and predict epidemic spread by using real-world infection data gathered in the UK during the COVID-19 epidemic. These examples illustrate that, under typical conditions, the prediction for the next twenty days has 94.29% of the noisy data located within the 95% confidence interval, validating these predictions. We further compare the modeling and prediction results with other methods, such as polynomial regression, k-nearest neighbors (KNN) regression, and neural networks, to demonstrate the benefits of leveraging GPR in disease spread modeling.
翻译:流行病传播建模对于指导旨在缓解疫情的政策决策至关重要。因此,本研究提出了一种基于高斯过程回归的新型数据驱动方法,通过感染病例对数尺度上的差异来建模流行病传播。我们界定了GPR所作预测的方差,该方差量化了流行病数据对所提模型的影响。接着,我们推导了预测误差的高概率误差界,该误差界以训练点与测试点之间的距离、后验方差以及传播过程的变化水平表示,并评估了流行病传播特征与感染数据如何影响此误差界。我们展示了使用GPR建模和预测流行病传播的实例,这些实例利用了英国在COVID-19疫情期间收集的真实世界感染数据。这些实例表明,在典型条件下,对未来二十天的预测有94.29%的含噪数据位于95%置信区间内,从而验证了这些预测。我们进一步将建模和预测结果与其他方法(如多项式回归、K近邻回归和神经网络)进行比较,以证明在疾病传播建模中利用GPR的优势。