Hawkes stochastic point process models have emerged as valuable statistical tools for analyzing viral contagion. The spatiotemporal Hawkes process characterizes the speeds at which viruses spread within human populations. Unfortunately, likelihood-based inference using these models requires $O(N^2)$ floating-point operations, for $N$ the number of observed cases. Recent work responds to the Hawkes likelihood's computational burden by developing efficient graphics processing unit (GPU)-based routines that enable Bayesian analysis of tens-of-thousands of observations. We build on this work and develop a high-performance computing (HPC) strategy that divides 30 Markov chains between 4 GPU nodes, each of which uses multiple GPUs to accelerate its chain's likelihood computations. We use this framework to apply two spatiotemporal Hawkes models to the analysis of one million COVID-19 cases in the United States between March 2020 and June 2023. In addition to brute-force HPC, we advocate for two simple strategies as scalable alternatives to successful approaches proposed for small data settings. First, we use known county-specific population densities to build a spatially varying triggering kernel in a manner that avoids computationally costly nearest neighbors search. Second, we use a cut-posterior inference routine that accounts for infections' spatial location uncertainty by iteratively sampling latent locations uniformly within their respective counties of occurrence, thereby avoiding full-blown latent variable inference for 1,000,000 infection locations.
翻译:霍克斯随机点过程模型已成为分析病毒传播的重要统计工具。时空霍克斯过程刻画了病毒在人群中传播的速度。然而,基于这些模型的似然推断需要$O(N^2)$次浮点运算($N$为观测病例数)。近期研究针对霍克斯似然函数的计算负担,开发了基于图形处理器(GPU)的高效计算流程,实现了对数万观测值的贝叶斯分析。本研究在此基础上提出高性能计算(HPC)策略:将30条马尔可夫链分配至4个GPU节点,每个节点使用多块GPU加速其链的似然计算。借助该框架,我们将两种时空霍克斯模型应用于2020年3月至2023年6月期间美国百万例COVID-19病例的分析。除暴力计算方案外,我们提出两种可扩展的简化策略以替代小数据场景的成功方法:首先,利用已知的县级人口密度构建空间可变的触发核函数,避免计算成本高昂的最近邻搜索;其次,采用截断后验推断流程,通过在病例所属县内均匀采样潜在空间位置来量化感染位置的不确定性,从而避免对百万感染位置进行完整的潜变量推断。