An exploration into how susceptibility distribution misspecifications impact epidemic forecasting

Heterogeneous susceptibility models for epidemic dynamics preferentially assume that individual susceptibility follows a gamma distribution, which permits analytical reduction to a low-dimensional system. However, the true empirical distributional form in any given population is unknown. Here we investigate the consequences of misspecifying the susceptibility distribution by comparing gamma and lognormal specifications in a Susceptible-Exposed-Infectious-Removed (SEIR) framework. When both distributions are matched on mean and coefficient of variation ($ν$), we find that their epidemic trajectories diverge once heterogeneity is moderate or high ($ν\gtrsim 1$), with the lognormal producing a later, larger peak and a greater final size. We then assess the impact of distributional misspecification on statistical inference. Using synthetic datasets, we fit correctly specified and misspecified models by maximum likelihood. In a default scenario, where inference is based on simulated data for a single epidemic, both models can reproduce the data by compensating through correlated shifts in heterogeneity and intervention parameters. When inference is based on two simulated epidemics, however, this compensation may be reduced by known constraints of how parameters are related across epidemics. In these cases, the correctly specified model recovers all parameters accurately, while the misspecified model tends to give biased estimates. These inference biases propagate into forecasts, but predictions remain relatively accurate when compared to homogeneous models which more than double peak incidences in scenarios where $ν\approx 1$, for instance. We conclude that deviations resulting from the susceptibility distribution misspecifications assessed here are minor and encourage the adoption of heterogeneous models in future epidemic forecasting.

翻译：针对传染病动力学中的异质性易感模型，通常假设个体易感性服从伽马分布，以便通过解析简化降维至低阶系统。然而，在给定人群中真实的经验分布形式是未知的。本研究通过比较易感-暴露-感染-移除（SEIR）框架下的伽马分布与对数正态分布，探讨错误设定易感性分布的后果。当两种分布在均值和变异系数（$ν$）上相匹配时，我们发现一旦异质性达到中等或较高水平（$ν\gtrsim 1$），疫情轨迹出现分歧——对数正态分布会导致更晚、更大的峰值以及更高的最终规模。我们继而评估分布错误设定对统计推断的影响。利用合成数据集，我们通过最大似然估计拟合正确设定与错误设定的模型。在默认情景下（基于单一疫情的模拟数据进行推断），两种模型都能通过异质性与干预参数的相关性偏移补偿并复现数据。然而，当基于两轮模拟疫情进行推断时，这种补偿可能因已知的参数跨疫情约束关系而减弱。在此类情形中，正确设定模型能精确恢复所有参数，而错误设定模型往往给出有偏估计。这些推断偏差会传导至预报，但与均匀模型（例如在$ν\approx 1$场景下，其峰值发病率超过两倍）相比，预测结果仍相对准确。我们得出结论：本研究所评估的因易感性分布错误设定导致的偏差较小，并建议在未来疫情预报中采用异质性模型。