The computational cost of exact likelihood evaluation for partially observed and highly-heterogeneous individual-based models grows exponentially with the population size, therefore inference relies on approximations. Sampling-based approaches to this problem such as Sequential Monte Carlo or Approximate Bayesian Computation usually require simulation of every individual in the population multiple times and are heavily reliant on the design of bespoke proposal distributions or summary statistics, and can still scale poorly with population size. To overcome this, we propose a deterministic recursive approach to approximating the likelihood function using categorical distributions. The resulting algorithm has a computational cost as low as linear in the population size and is amenable to automatic differentiation, leading to simple algorithms for maximizing this approximate likelihood or sampling from posterior distributions. We prove consistency of the maximum approximate likelihood estimator of model parameters. We empirically test our approach on a range of models with various flavors of heterogeneity: different sets of disease states, individual-specific susceptibility and infectivity, spatial interaction mechanisms, under-reporting and mis-reporting. We demonstrate strong calibration performance, in terms of log-likelihood variance and ground truth recovery, and computational advantages over competitor methods. We conclude by illustrating the effectiveness of our approach in a real-world large-scale application using Foot-and-Mouth data from the 2001 outbreak in the United Kingdom.
翻译:对于部分观测且高度异质性的个体模型,精确似然评估的计算成本随种群规模呈指数增长,因此推断依赖于近似方法。针对该问题的采样方法,如序贯蒙特卡洛或近似贝叶斯计算,通常需要对种群中每个个体进行多次模拟,且严重依赖于定制提议分布或摘要统计量的设计,其计算复杂度仍可能随种群规模急剧增加。为克服此问题,我们提出一种使用分类分布近似似然函数的确定性递归方法。所得算法的计算成本可低至种群规模的线性复杂度,且适用于自动微分,从而可衍生出最大化此近似似然或从后验分布采样的简单算法。我们证明了模型参数的最大近似似然估计量的一致性。我们在具有多种异质性特征的模型上实证检验了所提方法:不同的疾病状态集、个体特异性易感性与传染性、空间交互机制、漏报与误报。我们在对数似然方差与真实参数恢复方面展示了优异的校准性能,并验证了相较于竞争方法的计算优势。最后,我们通过使用2001年英国口蹄疫疫情数据的实际大规模应用,展示了本方法的有效性。