Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff

Insurers usually turn to generalized linear models for modeling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). The CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network, and we explore their potential advantages in a frequency-severity setting. Model performance is evaluated not only on out-of-sample deviance but also using statistical and calibration performance criteria and managerial tools to get more nuanced insights. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.

翻译：保险公司通常采用广义线性模型对索赔频率与严重程度数据进行建模。由于机器学习技术在其他领域的成功应用，其在精算工具箱中的普及度日益提升。本文通过深度学习架构，为基于频率-严重程度的保险定价机器学习研究领域作出贡献。我们在包含多种输入特征类型的场景下，对四个具有频率与严重程度目标的保险数据集进行了基准研究。我们详细比较了以下模型的性能：基于分箱输入数据的广义线性模型、梯度提升树模型、前馈神经网络（FFNN）以及组合精算神经网络（CANN）。CANN模型分别将基于GLM和GBM建立的基线预测与神经网络修正项相结合。我们详细阐述了数据预处理步骤，特别关注表格型保险数据集中典型存在的多种输入特征类型，如邮政编码、数值型与分类型协变量。研究采用自编码器将分类变量嵌入神经网络，并探讨了其在频率-严重程度建模场景中的潜在优势。模型性能评估不仅基于样本外偏差，还结合统计与校准性能标准及管理工具以获得更精细的洞察。最后，我们为神经网络的频率与严重程度模型构建了全局代理模型。这些代理模型能够将FFNN或CANN捕捉的核心信息转化为GLM可解释的形式，从而生成可直接应用于实践的技术费率表。

相关内容

Neural Networks

关注 1653

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日