This paper addresses significant obstacles that arise from the widespread use of machine learning models in the insurance industry, with a specific focus on promoting fairness. The initial challenge lies in effectively leveraging unlabeled data in insurance while reducing the labeling effort and emphasizing data relevance through active learning techniques. The paper explores various active learning sampling methodologies and evaluates their impact on both synthetic and real insurance datasets. This analysis highlights the difficulty of achieving fair model inferences, as machine learning models may replicate biases and discrimination found in the underlying data. To tackle these interconnected challenges, the paper introduces an innovative fair active learning method. The proposed approach samples informative and fair instances, achieving a good balance between model predictive performance and fairness, as confirmed by numerical experiments on insurance datasets.
翻译:本文探讨了机器学习模型在保险行业广泛应用中产生的重大障碍,特别聚焦于促进公平性。首要挑战在于通过主动学习技术,在减少标注工作量的同时有效利用保险领域的未标注数据,并强调数据相关性。本文研究了多种主动学习采样方法,并在合成数据集和真实保险数据集上评估其影响。分析揭示了实现公平模型推断的困难,因为机器学习模型可能复制基础数据中的偏见与歧视。为应对这些相互关联的挑战,本文提出了一种创新的公平主动学习方法。该方法通过采样兼具信息性与公平性的实例,在模型预测性能与公平性之间实现良好平衡,保险数据集的数值实验验证了这一效果。