Gradient boosting is a sequential ensemble method that fits a new base learner to the gradient of the remaining loss at each step. We propose a novel family of gradient boosting, Wasserstein gradient boosting, which fits a new base learner to an exactly or approximately available Wasserstein gradient of a loss functional on the space of probability distributions. Wasserstein gradient boosting returns a set of particles that approximates a target probability distribution assigned at each input. In probabilistic prediction, a parametric probability distribution is often specified on the space of output variables, and a point estimate of the output-distribution parameter is produced for each input by a model. Our main application of Wasserstein gradient boosting is a novel distributional estimate of the output-distribution parameter, which approximates the posterior distribution over the output-distribution parameter determined pointwise at each data point. We empirically demonstrate the superior performance of the probabilistic prediction by Wasserstein gradient boosting in comparison with various existing methods.
翻译:梯度提升是一种序列化集成方法,在每个步骤中将新的基学习器拟合到剩余损失的梯度上。我们提出了一种新的梯度提升族——Wasserstein梯度提升,该方法将新的基学习器拟合到概率分布空间上损失泛函的精确或近似可得的Wasserstein梯度。Wasserstein梯度提升返回一组粒子,用以近似每个输入处指定的目标概率分布。在概率预测中,通常对输出变量空间指定参数化概率分布,并通过模型为每个输入生成输出分布参数的点估计。Wasserstein梯度提升的主要应用是一种新的输出分布参数分布估计,该估计近似了在每个数据点逐点确定的输出分布参数的后验分布。我们通过实验证明,与多种现有方法相比,基于Wasserstein梯度提升的概率预测具有更优越的性能。