Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry

The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are caused by privacy concerns, the rarity of claim events, the lack of informative rating factors, etc.. During each round of FL, collaborators compute improvements on the model using their local private data, and these insights are combined to update a global model. Such aggregation of insights allows for an increase to the effectiveness in forecasting claims losses compared to models individually trained at each collaborator. Critically, this approach enables machine learning collaboration without the need for raw data to leave the compute infrastructure of each respective data owner. Additionally, the open-source framework, OpenFL, that is used in our experiments is designed so that it can be run using confidential computing as well as with additional algorithmic protections against leakage of information via the shared model updates. In such a way, FL is implemented as a privacy-enhancing collaborative learning technique that addresses the challenges posed by the sensitivity and privacy of data in traditional machine learning solutions. This paper's application of FL can also be expanded to other areas including fraud detection, catastrophe modeling, etc., that have a similar need to incorporate data privacy into machine learning collaborations. Our framework and empirical results provide a foundation for future collaborations among insurers, regulators, academic researchers, and InsurTech experts.

翻译：本报告展示了利用联邦学习在多个保险行业数据集上训练单一模型（无需在保险公司间共享原始数据集）所带来的好处（特别是在改善索赔损失建模方面）。联邦学习的应用解决了两个最紧迫的挑战：数据量有限和数据多样性不足，这些挑战通常源于隐私顾虑、索赔事件罕见、缺乏信息丰富的评级因子等因素。在每轮联邦学习中，协作者使用其本地私有数据计算模型改进，这些改进被聚合以更新全局模型。与各协作者独立训练的模型相比，这种见解聚合提升了索赔损失预测的有效性。关键的是，该方法实现了机器学习协作，无需原始数据离开各自数据所有者的计算基础设施。此外，我们实验中所用的开源框架OpenFL在设计上支持机密计算运行，并额外提供算法保护以防止通过共享模型更新导致信息泄露。通过这种方式，联邦学习被实现为一种隐私增强型协作学习技术，解决了传统机器学习解决方案中数据敏感性和隐私性带来的挑战。本文对联邦学习的应用还可扩展至欺诈检测、巨灾建模等其他需要将数据隐私融入机器学习协作的领域。我们的框架及实证结果为保险公司、监管机构、学术研究人员及保险科技专家之间的未来合作奠定了基础。