In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied.
翻译:本研究考虑回归框架下的数据不平衡问题,其中不平衡现象涉及连续或离散协变量。此类情形可能导致估计偏差。针对该问题,我们提出一种结合加权重采样(WR)与数据增强(DA)流程的数据增强算法。第一步,DA流程允许探索比初始支持域更广的范围;第二步,WR方法将外生分布驱动至目标分布。通过数值研究探讨DA流程的选择,论证该方法的优势。最后,基于精算应用案例进行分析。