We introduce new approaches for forecasting IBNR (Incurred But Not Reported) frequencies by leveraging individual claims data, which includes accident date, reporting delay, and possibly additional features for every reported claim. A key element of our proposal involves computing development factors, which may be influenced by both the accident date and other features. These development factors serve as the basis for predictions. While we assume close to continuous observations of accident date and reporting delay, the development factors can be expressed at any level of granularity, such as months, quarters, or year and predictions across different granularity levels exhibit coherence. The calculation of development factors relies on the estimation of a hazard function in reverse development time, and we present three distinct methods for estimating this function: the Cox proportional hazard model, a feed-forward neural network, and xgboost (eXtreme gradient boosting). In all three cases, estimation is based on the same partial likelihood that accommodates left truncation and ties in the data. While the first case is a semi-parametric model that assumes in parts a log linear structure, the two machine learning approaches only assume that the baseline and the other factors are multiplicatively separable. Through an extensive simulation study and real-world data application, our approach demonstrates promising results. This paper comes with an accompanying R-package, $\texttt{ReSurv}$, which can be accessed at \url{https://github.com/edhofman/ReSurv}
翻译:我们提出利用个体索赔数据预测IBNR(已发生未报告)频率的新方法,该数据包含每笔已报告索赔的事故日期、报告延迟及可能的附加特征。本文核心要素在于计算可能受事故日期及其他特征影响的进展因子,这些进展因子作为预测的基础。假设事故日期和报告延迟接近连续观测,进展因子可表达为任意粒度级别(如月度、季度或年度),且不同粒度级别的预测具有一致性。进展因子的计算依赖于逆向发展时间中风险函数的估计,我们提出三种估计该函数的差异化方法:Cox比例风险模型、前馈神经网络及xgboost(极限梯度提升)。三种情况下均基于同一部分似然函数进行估计,该函数可处理数据中的左截断与结值。第一个模型是半参数模型,假设部分结构符合对数线性关系;而两种机器学习方法仅假设基线风险与其他因素呈乘法可分离形式。通过广泛的模拟研究与实际数据应用,本文方法展现出良好效果。本研究附带R语言工具包$\texttt{ReSurv}$,可于\url{https://github.com/edhofman/ReSurv}获取。