Inference of the reproduction number through time is of vital importance during an epidemic outbreak. Typically, epidemiologists tackle this using observed prevalence or incidence data. However, prevalence and incidence data alone is often noisy or partial. Models can also have identifiability issues with determining whether a large amount of a small epidemic or a small amount of a large epidemic has been observed. Sequencing data however is becoming more abundant, so approaches which can incorporate genetic data are an active area of research. We propose using particle MCMC methods to infer the time-varying reproduction number from a combination of prevalence data reported at a set of discrete times and a dated phylogeny reconstructed from sequences. We validate our approach on simulated epidemics with a variety of scenarios. We then apply the method to a real data set of HIV-1 in North Carolina, USA, between 1957 and 2019.
翻译:在疫情暴发期间,随时间变化的再生数推断至关重要。流行病学家通常利用观测到的患病率或发病率数据来解决这一问题。然而,仅凭患病率和发病率数据往往存在噪声或部分信息缺失的问题。模型还可能面临可辨识性问题,难以判断观测到的是大规模疫情中的小部分病例,还是小规模疫情中的大量病例。与此同时,测序数据日益丰富,因此能够整合遗传数据的方法成为当前研究热点。我们提出采用粒子马尔可夫链蒙特卡洛方法,结合一组离散时间点报告的患病率数据和从序列重建的带时间标记的系统发育树,来推断时变再生数。我们通过多种模拟疫情场景验证了该方法,并将其应用于美国北卡罗来纳州1957年至2019年期间HIV-1的真实数据集。