Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional surveillance signals. We describe the problem of spatial and temporal heterogeneity in these signals derived from these data sources, where spatial and/or temporal biases are present. We present a method to use a ``guiding'' signal to correct for these biases and produce a more reliable signal that can be used for modeling and forecasting. The method assumes that the heterogeneity can be approximated by a low-rank matrix and that the temporal heterogeneity is smooth over time. We also present a hyperparameter selection algorithm to choose the parameters representing the matrix rank and degree of temporal smoothness of the corrections. In the absence of ground truth, we use maps and plots to argue that this method does indeed reduce heterogeneity. Reducing heterogeneity from auxiliary data sources greatly increases their utility in modeling and forecasting epidemics.
翻译:辅助数据源在流行病学监测中日益重要,因为与传统监测信号相比,这些数据源通常具有更精细的空间和时间分辨率、更广的覆盖范围和更低的延迟。我们描述了从这些数据源导出的信号中存在的空间和时间异质性问题,即其中存在空间和/或时间偏差。我们提出了一种方法,利用"引导"信号来校正这些偏差,从而产生更可靠的信号,可用于建模和预测。该方法假设异质性可以通过低秩矩阵近似,并且时间异质性随时间平滑变化。我们还提出了一种超参数选择算法,用于选择代表矩阵秩和校正时间平滑程度的参数。在缺乏真实数据的情况下,我们使用地图和图表来论证该方法确实能减少异质性。减少辅助数据源的异质性极大地提高了其在流行病建模和预测中的实用性。