We develop a reinforcement learning (RL) framework for insurance loss reserving that formulates reserve setting as a finite-horizon sequential decision problem under claim development uncertainty, macroeconomic stress, and solvency governance. The reserving process is modeled as a Markov Decision Process (MDP) in which reserve adjustments influence future reserve adequacy, capital efficiency, and solvency outcomes. A Proximal Policy Optimization (PPO) agent is trained using a risk-sensitive reward that penalizes reserve shortfall, capital inefficiency, and breaches of a volatility-adjusted solvency floor, with tail risk explicitly controlled through Conditional Value-at-Risk (CVaR). To reflect regulatory stress-testing practice, the agent is trained under a regime-aware curriculum and evaluated using both regime-stratified simulations and fixed-shock stress scenarios. Empirical results for Workers Compensation and Other Liability illustrate how the proposed RL-CVaR policy improves tail-risk control and reduces solvency violations relative to classical actuarial reserving methods, while maintaining comparable capital efficiency. We further discuss calibration and governance considerations required to align model parameters with firm-specific risk appetite and supervisory expectations under Solvency II and Own Risk and Solvency Assessment (ORSA) frameworks.
翻译:我们提出一种面向保险损失准备金计提的强化学习(RL)框架,将准备金设定建模为在索赔发展不确定性、宏观经济压力与偿付能力治理约束下的有限期序贯决策问题。该准备金过程被形式化为马尔可夫决策过程(MDP),其中准备金调整将影响未来准备金充足性、资本效率及偿付能力结果。采用近端策略优化(PPO)智能体进行训练,其风险敏感型奖励函数对准备金短缺、资本低效以及违反波动率调整偿付能力底线的情形施加惩罚,并通过条件在险价值(CVaR)显式控制尾部风险。为反映监管压力测试实践,智能体在体制感知课程学习机制下训练,并通过体制分层模拟与固定冲击压力场景两种方式进行评估。针对劳工赔偿和其他责任险种的实证结果表明:相较于传统精算准备金方法,所提出的RL-CVaR策略能在保持相当资本效率的同时,显著改善尾部风险控制并减少偿付能力违规事件。我们进一步讨论了校准与治理考量,旨在使模型参数与Solvency II及自有风险与偿付能力评估(ORSA)框架下企业特定风险偏好与监管期望相协调。