Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.
翻译:双广义线性模型通过允许均值与色散随观测值变化,为数据建模提供了灵活框架。指数色散族的常见成员(包括高斯分布、泊松分布、复合泊松-伽马分布、伽马分布和逆高斯分布)均可采用此类模型。其应用受限的原因在于:当存在大量协变量时模型规范存在歧义,且数据呈现复杂空间依赖性时会产生建模难题。本研究考虑具有空间随机效应的复合泊松-伽马模型分层规范。该空间效应旨在通过对基于位置索引的响应数据中的依赖性建模,实现不确定性量化。我们采用高斯过程规范描述空间效应,同时通过贝叶斯变量选择解决此类模型的规范问题。该过程通过模型参数(特别是固定效应)上的连续尖峰-平板先验实现。本研究的创新性在于为这类模型构建了贝叶斯框架。我们通过多项合成实验验证框架准确性,并将其应用于分析2008年康涅狄格州汽车保险保费。