Data Mining of Telematics Data: Unveiling the Hidden Patterns in Driving Behaviour

With the advancement in technology, telematics data which capture vehicle movements information are becoming available to more insurers. As these data capture the actual driving behaviour, they are expected to improve our understanding of driving risk and facilitate more accurate auto-insurance ratemaking. In this paper, we analyze an auto-insurance dataset with telematics data collected from a major European insurer. Through a detailed discussion of the telematics data structure and related data quality issues, we elaborate on practical challenges in processing and incorporating telematics information in loss modelling and ratemaking. Then, with an exploratory data analysis, we demonstrate the existence of heterogeneity in individual driving behaviour, even within the groups of policyholders with and without claims, which supports the study of telematics data. Our regression analysis reiterates the importance of telematics data in claims modelling; in particular, we propose a speed transition matrix that describes discretely recorded speed time series and produces statistically significant predictors for claim counts. We conclude that large speed transitions, together with higher maximum speed attained, nighttime driving and increased harsh braking, are associated with increased claim counts. Moreover, we empirically illustrate the learning effects in driving behaviour: we show that both severe harsh events detected at a high threshold and expected claim counts are not directly proportional with driving time or distance, but they increase at a decreasing rate.

翻译：随着技术的进步，能够捕捉车辆运动信息的远程信息处理数据正越来越多地被保险公司获取。由于这些数据记录了真实的驾驶行为，它们有望增进我们对驾驶风险的理解，并促进更精确的车险费率厘定。本文分析了来自一家欧洲大型保险公司的车险数据集，该数据集包含远程信息处理数据。通过对远程信息处理数据结构及相关数据质量问题的详细讨论，我们阐述了在损失建模和费率厘定中处理并整合远程信息处理信息所面临的实际挑战。随后，通过探索性数据分析，我们证明了个体驾驶行为存在异质性，即使在有索赔和无索赔的保单持有人群体内部也是如此，这支持了远程信息处理数据的研究。我们的回归分析重申了远程信息处理数据在索赔建模中的重要性；特别地，我们提出了一个描述离散记录速度时间序列的速度转移矩阵，该矩阵为索赔次数产生了统计上显著的预测因子。我们得出结论：较大的速度变化、更高的最高速度、夜间驾驶以及紧急刹车增加与索赔次数增加相关。此外，我们实证地说明了驾驶行为中的学习效应：我们表明，在高阈值下检测到的严重紧急事件以及预期索赔次数与驾驶时间或距离并非直接成正比，而是以递减的速率增加。