In epidemiological studies, zero-inflated and hurdle models are commonly used to handle excess zeros in reported infectious disease cases. However, they can not model the persistence (from presence to presence) and reemergence (from absence to presence) of a disease separately. Covariates can sometimes have different effects on the reemergence and persistence of a disease. Recently, a zero-inflated Markov switching negative binomial model was proposed to accommodate this issue. We present a Markov switching negative binomial hurdle model as a competitor of that approach, as hurdle models are often also used as alternatives to zero-inflated models for accommodating excess zeroes. We begin the comparison by inspecting the underlying assumptions made by both models. Hurdle models assume perfect detection of the disease cases while zero-inflated models implicitly assume the case counts can be under-reported, thus we investigate when a negative binomial distribution can approximate the true distribution of reported counts. A comparison of the fit of the two types of Markov switching models is undertaken on chikungunya cases across the neighborhoods of Rio de Janeiro. We find that, among the fitted models, the Markov switching negative binomial zero-inflated model produces the best predictions and both Markov switching models produce remarkably better predictions than more traditional negative binomial hurdle and zero-inflated models.
翻译:在流行病学研究中,零膨胀模型和障碍模型常用于处理报告传染病病例中的多余零值。然而,它们无法分别对疾病的持续性(从存在到存在)和重新出现(从缺失到存在)进行建模。协变量有时对疾病的重新出现和持续性具有不同影响。近期,一种零膨胀马尔可夫切换负二项模型被提出以解决这一问题。我们提出一种马尔可夫切换负二项障碍模型作为该方法的竞争模型,因为障碍模型也常被用作零膨胀模型的替代方案来处理多余零值。我们首先通过检验两种模型的潜在假设来开始比较。障碍模型假设疾病病例被完美检测到,而零膨胀模型则隐含假设病例计数可能存在漏报,因此我们研究了负二项分布何时能近似报告计数的真实分布。我们在里约热内卢各街区的基孔肯雅热病例上对这两类马尔可夫切换模型的拟合效果进行了比较。结果发现,在所有拟合模型中,马尔可夫切换负二项零膨胀模型产生最佳预测,且两种马尔可夫切换模型的预测效果均显著优于更传统的负二项障碍模型和零膨胀模型。