Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models. However, actual interpretability requires to limit the number and size of the generated rules, and existing boosting variants are not designed for this purpose. Though corrective boosting refits all rule weights in each iteration to minimise prediction risk, the included rule conditions tend to be sub-optimal, because commonly used objective functions fail to anticipate this refitting. Here, we address this issue by a new objective function that measures the angle between the risk gradient vector and the projection of the condition output vector onto the orthogonal complement of the already selected conditions. This approach correctly approximate the ideal update of adding the risk gradient itself to the model and favours the inclusion of more general and thus shorter rules. As we demonstrate using a wide range of prediction tasks, this significantly improves the comprehensibility/accuracy trade-off of the fitted ensemble. Additionally, we show how objective values for related rule conditions can be computed incrementally to avoid any substantial computational overhead of the new method.
翻译:预测规则的梯度提升是一种高效学习潜在可解释且准确概率模型的方法。然而,实际可解释性要求限制生成规则的数量和规模,现有提升变体并非为此设计。尽管修正性提升在每次迭代中重新拟合所有规则权重以最小化预测风险,但纳入的规则条件往往次优,因为常用的目标函数未能预见这种重新拟合。为此,我们提出一种新目标函数,通过测量风险梯度向量与已有条件输出向量在正交补空间投影之间的角度来解决该问题。该方法能正确近似将风险梯度本身添加到模型中的理想更新,并倾向于纳入更通用、更短的规则。通过多种预测任务的实验验证,该方法显著改善了集成模型的可理解性与准确性之间的权衡。此外,我们展示了如何增量计算相关规则条件的目标值,从而避免新方法带来显著的计算开销。