Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose $\text{GRANDE}$, $\text{GRA}$die$\text{N}$t-Based $\text{D}$ecision Tree $\text{E}$nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that facilitates learning representations for both, simple and complex relations, within a single model. We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.
翻译:尽管深度学习在文本和图像数据上取得了成功,基于树的集成模型在处理异构表格数据的机器学习中仍处于领先地位。然而,由于梯度方法的高灵活性,对针对表格数据的特定梯度方法存在显著需求。本文提出GRANDE(梯度决策树集成),一种使用端到端梯度下降学习硬性、轴对齐决策树集成的新方法。GRANDE基于树集成的稠密表示,通过直通算子利用反向传播联合优化所有模型参数。该方法将轴对齐分割(对表格数据有效的归纳偏置)与梯度优化的灵活性相结合。此外,我们引入了一种先进的实例级加权方法,使单一模型能同时学习简单和复杂关系的表征。我们在包含19个分类数据集的预定义基准上进行了广泛评估,结果表明我们的方法在大多数数据集上优于现有的梯度提升和深度学习框架。