Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.
翻译:经典分位数回归方法在目标分位数极端且仅有极少或没有训练数据点超过该分位数时失效。极值理论的渐近结果可用于外推数据范围之外的数值,现有方法包括线性回归、核方法或广义加性模型。然而,当预测变量空间维度较高或极端分位数的回归函数复杂时,大多数方法会失效。我们提出了一种结合随机森林灵活性与极值外推理论的极端分位数回归方法。该方法通过从分位数随机森林中提取的权重最大化局部似然,估计条件于预测变量的广义帕累托分布参数。我们在该似然中对形状参数施加惩罚以正则化其在预测变量空间中的变异性。在一般吸引域条件下,我们证明了无惩罚和惩罚情形下参数估计的一致性。仿真研究表明,我们的极值随机森林(ERF)在性能上优于经典分位数回归方法及现有极值理论回归方法。我们将该方法应用于美国工资数据的极端分位数预测。