Practical identifiability and parameter estimation of compartmental epidemiological models

Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular emphasis is placed on the role of initial conditions, which are assumed unknown, except those that are directly measured. Instead of just focusing on one method of estimation, we use and compare results from various broadly used methods, including maximum likelihood and Markov Chain Monte Carlo (MCMC) estimation. Among other findings, our analysis revealed that the MCMC estimator is overall more robust than the point estimators considered. Its estimates and predictions are improved when the initial conditions of certain compartments are fixed so that the model becomes globally identifiable. For the point estimators, whether fixing or fitting the that are not directly measured improves parameter estimates is model-dependent. Specifically, in the standard SEIR model, fixing the initial condition for the susceptible population S(0) improved parameter estimates, while this was not true when fixing the initial condition of the asymptomatic population in a more involved model. Our study corroborates the change in quality of parameter estimates upon usage of pre-peak or post-peak time-series under consideration. Finally, our examples suggest that in the presence of significantly noisy data, the value of structural identifiability is moot.

翻译：基于ODE的流行病学模型中的实用参数可辨识性是一个已知但值得深入研究的问题。由于真实数据存在噪声和误差，该问题本质上普遍存在。本研究为避免来源不明的数据质量带来的不确定性，采用添加噪声的模拟数据来考察两种不同流行病学模型中的实用可辨识性。研究特别关注初始条件的作用——除直接测量的初始条件外，其余均假设为未知。我们并未局限于单一估计方法，而是使用并比较了多种广泛应用的估计方法（包括最大似然估计和马尔可夫链蒙特卡洛估计）的结果。研究发现，MCMC估计器总体上比所考察的点估计器更具鲁棒性。当固定特定仓室的初始条件使模型达到全局可辨识时，其估计和预测效果得到改善。对于点估计器而言，固定还是拟合未直接测量的初始条件能否改善参数估计，取决于具体模型：在标准SEIR模型中，固定易感人群S(0)的初始条件可改善参数估计；而在更复杂的模型中，固定无症状人群的初始条件则无此效果。本研究证实了使用疫情峰值前或峰值后时间序列数据对参数估计质量的影响。最后，我们的算例表明，当数据存在显著噪声时，结构可辨识性的价值将失去意义。