Reinsurance optimization is a cornerstone of solvency and capital management, yet traditional approaches often rely on restrictive distributional assumptions and static program designs. We propose a hybrid framework that combines Variational Autoencoders (VAEs) to learn joint distributions of multi-line and multi-year claims data with Proximal Policy Optimization (PPO) reinforcement learning to adapt treaty parameters dynamically. The framework explicitly targets expected surplus under capital and ruin-probability constraints, bridging statistical modeling with sequential decision-making. Using simulated and stress-test scenarios, including pandemic-type and catastrophe-type shocks, we show that the hybrid method produces more resilient outcomes than classical proportional and stop-loss benchmarks, delivering higher surpluses and lower tail risk. Our findings highlight the usefulness of generative models for capturing cross-line dependencies and demonstrate the feasibility of RL-based dynamic structuring in practical reinsurance settings. Contributions include (i) clarifying optimization goals in reinsurance RL, (ii) defending generative modeling relative to parametric fits, and (iii) benchmarking against established methods. This work illustrates how hybrid AI techniques can address modern challenges of portfolio diversification, catastrophe risk, and adaptive capital allocation.
翻译:再保险优化是偿付能力和资本管理的核心问题,但传统方法往往依赖于严格的分布假设和静态方案设计。我们提出一种混合框架,将变分自编码器(VAE)用于学习多险种、多年期索赔数据的联合分布,并与近端策略优化(PPO)强化学习相结合,以动态调整再保险条约参数。该框架明确以资本约束和破产概率约束下的预期盈余为目标,将统计建模与序贯决策相衔接。通过模拟和压力测试场景(包括大流行型和巨灾型冲击),我们证明该混合方法比经典的比例再保险和止损再保险基准能产生更具韧性的结果,实现更高盈余和更低尾部风险。我们的发现凸显了生成模型在捕捉跨险种依赖性方面的实用性,并证明了基于强化学习的动态结构在实际再保险场景中的可行性。本文的贡献包括:(i)厘清再保险强化学习中优化目标的定义,(ii)论证生成建模相对于参数拟合的优势,(iii)针对现有方法进行基准测试。本工作阐明了混合AI技术如何应对投资组合分散化、巨灾风险和自适应资本配置等现代挑战。