Outstanding claim liabilities are revised repeatedly as claims develop, yet most modern reserving models are trained as one-shot predictors and typically learn only from settled claims. We formulate individual claims reserving as a claim-level Markov decision process in which an agent sequentially updates outstanding claim liability (OCL) estimates over development, using continuous actions and a reward design that balances accuracy with stable reserve revisions. A key advantage of this reinforcement learning (RL) approach is that it can learn from all observed claim trajectories, including claims that remain open at valuation, thereby avoiding the reduced sample size and selection effects inherent in supervised methods trained on ultimate outcomes only. We also introduce practical components needed for actuarial use -- initialisation of new claims, temporally consistent tuning via a rolling-settlement scheme, and an importance-weighting mechanism to mitigate portfolio-level underestimation driven by the rarity of large claims. On CAS and SPLICE synthetic general insurance datasets, the proposed Soft Actor-Critic implementation delivers competitive claim-level accuracy and strong aggregate OCL performance, particularly for the immature claim segments that drive most of the liability.
翻译:未决赔款负债在理赔进展过程中会反复调整,然而大多数现代准备金评估模型均被训练为一次性预测器,且通常仅从已结案理赔中学习。本文将个体理赔准备金评估建模为一个理赔层面的马尔可夫决策过程,其中智能体在理赔进展过程中通过连续动作和平衡预测精度与准备金修订稳定性的奖励设计,顺序更新未决赔款负债估计值。该强化学习方法的一个关键优势在于,其能够从所有已观测的理赔轨迹中学习,包括在评估时点仍未结案的理赔,从而避免了仅基于最终结果训练的有监督方法中固有的样本量缩减和选择偏差问题。本文还引入了精算实务所需的实用组件——新理赔的初始化方法、通过滚动结案方案实现的时间一致性调参机制,以及用于缓解因大额理赔罕见性导致的组合层面低估问题的重要性加权机制。在CAS和SPLICE合成一般保险数据集上的实验表明,所提出的Soft Actor-Critic实施方案在理赔层面精度上具有竞争力,并在聚合未决赔款负债预测方面表现优异,尤其对于构成主要负债的未成熟理赔段效果显著。