The development of Machine Learning is experiencing growing interest from the general public, and in recent years there have been numerous press articles questioning its objectivity: racism, sexism, \dots Driven by the growing attention of regulators on the ethical use of data in insurance, the actuarial community must rethink pricing and risk selection practices for fairer insurance. Equity is a philosophy concept that has many different definitions in every jurisdiction that influence each other without currently reaching consensus. In Europe, the Charter of Fundamental Rights defines guidelines on discrimination, and the use of sensitive personal data in algorithms is regulated. If the simple removal of the protected variables prevents any so-called `direct' discrimination, models are still able to `indirectly' discriminate between individuals thanks to latent interactions between variables, which bring better performance (and therefore a better quantification of risk, segmentation of prices, and so on). After introducing the key concepts related to discrimination, we illustrate the complexity of quantifying them. We then propose an innovative method, not yet met in the literature, to reduce the risks of indirect discrimination thanks to mathematical concepts of linear algebra. This technique is illustrated in a concrete case of risk selection in life insurance, demonstrating its simplicity of use and its promising performance.
翻译:机器学习的发展正日益受到公众关注,近年来有大量媒体文章质疑其客观性:种族主义、性别歧视……受监管机构对保险数据使用伦理日益重视的推动,精算界必须重新审视定价与风险筛选实践,以实现更公平的保险。公平性是一个哲学概念,在不同司法管辖区有许多相互影响的定义,且目前尚未达成共识。在欧洲,《基本权利宪章》规定了反歧视准则,并对算法中敏感个人数据的使用制定了规范。即使移除受保护变量能杜绝所谓的“直接”歧视,模型仍可能通过变量间的潜在交互作用对个体实施“间接”歧视——这种交互作用还能提升模型性能(从而更精准地量化风险、优化价格分层等)。在介绍与歧视相关的核心概念后,我们将阐明量化这些概念的复杂性。进而提出一种文献中尚未出现过的创新方法:借助线性代数的数学概念来降低间接歧视风险。本文以寿险风险筛选的具体案例展示了该技术的应用,证明其操作简便性及卓越性能。