Data analysis and individual policy-level modeling for insurance requires working with data exhibiting strong spatiotemporal correlations, non-Gaussian data, and relatively large volume with interesting hierarchical structure. In this work, we show that by employing gradient-based Markov chain Monte Carlo (MCMC) accelerated by graphics processing units, previous tradeoffs between rich model structure and scalability for inference no longer exist at the million-record level. By writing our model in NumPyro, we are able to use its off-the-shelf MCMC capabilities to fit a model with several nontrivial components including latent conditional autoregression and spline-based exposure adjustment with a speedup of 88\% relative to a CPU-based implementation.. We employ this model in a case study of 2.6 million policy-level claim count records related to automobile insurance in Brazil from 2011. We highlight how this modeling workflow can substantially enhance ongoing efforts towards risk assessment for highly multivariate correlated outcomes. Code and data are available at https://github.com/ckrapu/bayes-at-scale.
翻译:保险数据的分析及个体保单层面建模需要处理具有强时空相关性、非高斯特征、相对大规模且具有有趣分层结构的数据。本文证明,通过采用图形处理器加速的梯度马尔可夫链蒙特卡洛方法,以往在丰富模型结构与推理可扩展性之间的权衡在百万级记录规模下已不复存在。通过在NumPyro中编写模型,我们能够利用其现成的MCMC能力拟合包含潜在条件自回归和基于样条的暴露调整等多个非平凡组件的模型,相较于基于CPU的实现实现了88%的加速。我们将该模型应用于巴西2011年260万条与汽车保险相关的保单级索赔记录案例研究,重点展示了该建模工作流如何显著强化针对高度多元相关结果的风险评估工作。代码与数据见https://github.com/ckrapu/bayes-at-scale。