Forward stagewise regression is a simple algorithm that can be used to estimate regularized models. The updating rule adds a small constant to a regression coefficient in each iteration, such that the underlying optimization problem is solved slowly with small improvements. This is similar to gradient boosting, with the essential difference that the step size is determined by the product of the gradient and a step length parameter in the latter algorithm. One often overlooked challenge in gradient boosting for distributional regression is the issue of a vanishing small gradient, which practically halts the algorithm's progress. We show that gradient boosting in this case oftentimes results in suboptimal models, especially for complex problems certain distributional parameters are never updated due to the vanishing gradient. Therefore, we propose a stagewise boosting-type algorithm for distributional regression, combining stagewise regression ideas with gradient boosting. Additionally, we extend it with a novel regularization method, correlation filtering, to provide additional stability when the problem involves a large number of covariates. Furthermore, the algorithm includes best-subset selection for parameters and can be applied to big data problems by leveraging stochastic approximations of the updating steps. Besides the advantage of processing large datasets, the stochastic nature of the approximations can lead to better results, especially for complex distributions, by reducing the risk of being trapped in a local optimum. The performance of our proposed stagewise boosting distributional regression approach is investigated in an extensive simulation study and by estimating a full probabilistic model for lightning counts with data of more than 9.1 million observations and 672 covariates.
翻译:前向阶段式回归是一种可用于估计正则化模型的简单算法。其更新规则在每次迭代中向回归系数添加一个小的常数,从而以微小改进缓慢求解底层优化问题。这与梯度提升类似,但本质区别在于后者的步长由梯度与步长参数的乘积决定。在分布回归的梯度提升中,一个常被忽视的挑战是梯度消失问题,这实际上会阻碍算法的进展。我们证明,梯度提升在此情况下常导致次优模型,特别是对于复杂问题,某些分布参数因梯度消失而从未更新。因此,我们提出一种用于分布回归的阶段式提升类算法,将阶段式回归思想与梯度提升相结合。此外,我们通过一种新颖的正则化方法——相关性过滤——对其进行扩展,以在问题涉及大量协变量时提供额外的稳定性。该算法还包含参数的最优子集选择,并可通过利用更新步骤的随机近似应用于大数据问题。除了处理大规模数据集的优势外,近似的随机性可通过降低陷入局部最优的风险,特别是在复杂分布情况下获得更好的结果。我们通过广泛的模拟研究,以及利用超过910万条观测数据和672个协变量估计闪电计数的全概率模型,对所提出的阶段式提升分布回归方法的性能进行了验证。