Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.
翻译:经典分位数回归方法在目标分位数极端且仅有少量或没有训练数据点超过该分位数时失效。极值理论的渐近结果可用于推断数据范围以外的情形,现有方法包括线性回归、核方法及广义加性模型。当预测变量空间维数较高或极端分位数回归函数复杂时,大多数方法会失效。我们提出一种结合随机森林灵活性与极值外推理论的极端分位数回归方法。我们的极值随机森林(ERF)通过最大化从分位数随机森林中提取权重的局部似然,估计依赖于预测向量的广义帕累托分布参数。我们对该似然中的形状参数施加惩罚以正则化其在预测变量空间中的变异性。在一般吸引域条件下,我们证明了无惩罚和惩罚情形下参数估计的一致性。模拟研究表明,我们的ERF优于经典分位数回归方法及现有极值理论回归方法。我们将该方法应用于美国工资数据的极端分位数预测。