Multicalibration gradient boosting has recently emerged as a scalable method that empirically produces approximately multicalibrated predictors and has been deployed at web scale. Despite this empirical success, its convergence properties are not well understood. In this paper, we bridge the gap by providing convergence guarantees for multicalibration gradient boosting in regression with squared-error loss. We show that the magnitude of successive prediction updates decays at $O(1/\sqrt{T})$, which implies the same convergence rate bound for the multicalibration error over rounds. Under additional smoothness assumptions on the weak learners, this rate improves to linear convergence. We further analyze adaptive variants, showing local quadratic convergence of the training loss, and we study rescaling schemes that preserve convergence. Experiments on real-world datasets support our theory and clarify the regimes in which the method achieves fast convergence and strong multicalibration.
翻译:多校准梯度提升作为一种可扩展方法,近期在实证中能生成近似多校准的预测器,并已在网络规模场景中部署应用。尽管其实证效果显著,其收敛性质尚未得到充分理解。本文通过为采用平方误差损失的回归任务中的多校准梯度提升提供收敛性保证,填补了这一研究空白。我们证明连续预测更新的幅度以$O(1/\sqrt{T})$速率衰减,这意味着多校准误差在迭代轮次中具有相同的收敛速率上界。在对弱学习器附加平滑性假设的条件下,该速率可提升至线性收敛。我们进一步分析自适应变体,证明训练损失具有局部二次收敛性,并研究了保持收敛性的重缩放方案。在真实数据集上的实验验证了我们的理论,并明确了该方法实现快速收敛与强多校准性能的适用场景。