One of the main tasks of actuaries and data scientists is to build good predictive models for certain phenomena such as the claim size or the number of claims in insurance. These models ideally exploit given feature information to enhance the accuracy of prediction. This user guide revisits and clarifies statistical techniques to assess the calibration or adequacy of a model on the one hand, and to compare and rank different models on the other hand. In doing so, it emphasises the importance of specifying the prediction target functional at hand a priori (e.g. the mean or a quantile) and of choosing the scoring function in model comparison in line with this target functional. Guidance for the practical choice of the scoring function is provided. Striving to bridge the gap between science and daily practice in application, it focuses mainly on the pedagogical presentation of existing results and of best practice. The results are accompanied and illustrated by two real data case studies on workers' compensation and customer churn.
翻译:精算师和数据科学家的主要任务之一是构建针对特定现象(如保险中的索赔金额或索赔次数)的良好预测模型。理想情况下,这些模型能够利用给定的特征信息来提高预测的准确性。本用户指南重新审视并阐明了统计技术:一方面用于评估模型的校准性或充分性,另一方面用于比较和排序不同模型。在此过程中,指南强调了事先明确指定所关注的预测目标函数(例如均值或分位数)的重要性,并指出在模型比较中应选择与该目标函数一致的评分函数。指南为实际选择评分函数提供了指导。为弥合科学与日常实践应用之间的差距,本指南主要侧重于对现有结果和最佳实践的教学性阐述。这些结果通过两个真实数据案例研究(涉及员工赔偿和客户流失)进行了演示和说明。