Machine learning (ML) will likely play a large role in many processes in the future, also for insurance companies. However, ML models are at risk of being attacked and manipulated. In this work, the robustness of Gradient Boosted Decision Tree (GBDT) models and Deep Neural Networks (DNN) within an insurance context will be evaluated. Therefore, two GBDT models and two DNNs are trained on two different tabular datasets from an insurance context. Past research in this domain mainly used homogenous data and there are comparably few insights regarding heterogenous tabular data. The ML tasks performed on the datasets are claim prediction (regression) and fraud detection (binary classification). For the backdoor attacks different samples containing a specific pattern were crafted and added to the training data. It is shown, that this type of attack can be highly successful, even with a few added samples. The backdoor attacks worked well on the models trained on one dataset but poorly on the models trained on the other. In real-world scenarios the attacker will have to face several obstacles but as attacks can work with very few added samples this risk should be evaluated.
翻译:机器学习(ML)未来很可能在众多流程中发挥重要作用,对保险公司亦是如此。然而,机器学习模型存在遭受攻击与操纵的风险。本研究旨在评估保险场景下梯度提升决策树(GBDT)模型与深度神经网络(DNN)的鲁棒性。为此,我们基于保险领域的两份不同表格数据集,分别训练了两个GBDT模型和两个DNN。该领域既往研究主要使用同质数据,针对异质表格数据的见解相对有限。本研究在数据集上执行的机器学习任务包括理赔预测(回归)与欺诈检测(二分类)。针对后门攻击,我们构建了包含特定模式的不同样本并将其添加至训练数据。实验表明,即使仅添加少量样本,此类攻击仍可能取得极高成功率。后门攻击在基于某一数据集训练的模型上效果显著,但在基于另一数据集训练的模型上表现欠佳。在实际场景中,攻击者将面临多重障碍,但由于攻击仅需极少量添加样本即可生效,相关风险仍需审慎评估。