Nowcasting and forecasting of infectious diseases have become increasingly important since the SARS-CoV-2 pandemic. In particular, methods for modeling the composition of circulating variants at a given time have seen more use in part due to a large increase in the frequency of genomic sequencing conducted as a part of routine surveillance. However, methods must take into account that locations have different amounts of data and sometimes have different trends. We discuss hierarchical multinomial logistic regression (HMLR), a commonly used method for forecasting SARS-CoV-2 variants, which allows for data sharing across locations. We show how it has been used in the literature, and define a class of HMLR models for SARS-CoV-2 variant nowcasting and forecasting. We rigorously test a subset of this class of models using the framework of the US SARS-CoV-2 Variant Nowcast Hub, a collaborative modeling project that launched in 2024. We created two years of weekly predictions based on retrospective datasets, with the prediction dates ranging from Wednesday, August 3, 2022, to Wednesday, August 7, 2024. We tested 12 HMLR models against a baseline model on these datasets. We found that the HMLR models outperformed the baseline both in terms of probabilistic accuracy, as measured by the energy score, as well as point accuracy, as measured by the Brier score. Overall, we find that HMLR models perform best with respect to the baseline model in locations with more data, and more complex HMLR models also showed more improvement in those high-data locations; however, there was no one best model across all metrics, and simpler HMLR models perform better in low-data locations. We find that HMLR models perform well in practice for nowcasting and forecasting SARS-CoV-2 variants.
翻译:自SARS-CoV-2大流行以来,传染病的临近预报与预测日益受到重视。特别是,对特定时间点流行毒株组成进行建模的方法应用更广,这在一定程度上归因于常规监测中基因组测序频率的大幅提升。然而,建模方法必须考虑不同地区数据量的差异及趋势的异质性。本文探讨了分层多项逻辑回归(HMLR)——一种常用于预测SARS-CoV-2变异株的方法,该方法允许跨地区数据共享。我们阐述了HMLR在文献中的应用方式,并定义了一类用于SARS-CoV-2变异株临近预报与预测的HMLR模型。我们以2024年启动的协作建模项目——美国SARS-CoV-2变异株临近预报中心为框架,对该模型类的一个子集进行了严格测试。基于回顾性数据集,我们生成了为期两年的周度预测,预测日期跨度从2022年8月3日(星期三)至2024年8月7日(星期三)。我们在这些数据集上测试了12种HMLR模型,并将其与基线模型进行对比。结果表明,无论是通过能量得分衡量的概率准确性,还是通过布莱尔评分衡量的点预测准确性,HMLR模型均优于基线模型。整体而言,我们发现HMLR模型在数据量更丰富的地区表现更佳,且复杂HMLR模型在数据量高的地区改进效果更显著;然而,在所有评估指标上并不存在单一最优模型,而简单HMLR模型在数据量少的地区表现更优。研究证实,HMLR模型在SARS-CoV-2变异株的临近预报与预测实践中具有良好性能。