The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least mitigation is desirable. With the shift from more traditional models to machine-learning based predictions, calls for greater mitigation have grown anew, as simply excluding sensitive variables in the pricing process can be shown to be ineffective. In this article, we first investigate why predictions are a necessity within the industry and why correcting biases is not as straightforward as simply identifying a sensitive variable. We then propose to ease the biases through the use of Wasserstein barycenters instead of simple scaling. To demonstrate the effects and effectiveness of the approach we employ it on real data and discuss its implications.
翻译:保险业高度依赖基于潜在客户特征的风险预测。尽管此类模型应用广泛,但研究人员早已指出,这种实践会基于性别或种族等敏感特征延续歧视。鉴于这种歧视往往源于历史数据偏差,消除或至少缓解歧视十分必要。随着从传统模型向基于机器学习的预测转变,对加强缓解措施的呼声再度高涨——因为在定价过程中简单排除敏感变量已被证明无效。本文首先探究保险业为何必须进行预测,以及纠正偏差为何无法通过简单识别敏感变量实现。进而提出利用Wasserstein重心替代简单缩放方法来减轻偏差。为展示该方法的效果与有效性,我们将其应用于真实数据并探讨其影响。