For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) improved default parameters for GBDTs and RealMLP. We tune RealMLP and the default parameters on a meta-train benchmark with 71 classification and 47 regression datasets and compare them to hyperparameter-optimized versions on a disjoint meta-test benchmark with 48 classification and 42 regression datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). Our benchmark results show that RealMLP offers a better time-accuracy tradeoff than other neural nets and is competitive with GBDTs. Moreover, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results on medium-sized tabular datasets (1K--500K samples) without hyperparameter tuning.
翻译:在表格数据的分类与回归任务中,梯度提升决策树(GBDTs)的主导地位近来常受到深度学习方法(通常速度慢得多且需大量超参数调优)的挑战。我们通过引入(a)RealMLP(一种改进的多层感知机(MLP))以及(b)GBDTs与RealMLP的改进默认参数来解决这一差异。我们在包含71个分类和47个回归数据集的元训练基准上对RealMLP及默认参数进行调优,并在包含48个分类和42个回归数据集的互斥元测试基准上,以及Grinsztajn等人(2022)提出的GBDT友好型基准上,将其与经过超参数优化的版本进行比较。我们的基准测试结果表明,RealMLP相比其他神经网络提供了更好的时间-精度权衡,并且与GBDTs具有竞争力。此外,结合RealMLP与采用改进默认参数的GBDTs,可以在无需超参数调优的情况下,在中等规模的表格数据集(1K--500K样本)上取得优异的结果。