Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.
翻译:平滑基模型,如切比雪夫多项式回归器和径向基函数(RBF)网络,在数值分析中历史悠久。其连续可微的预测曲面适用于代理优化、敏感性分析以及因变量随输入渐变的场景。尽管具备这些特性,平滑模型在树集成占主导地位的表格回归中却鲜有出现。我们探究它们是否具备竞争力,并在按应用领域组织的55个回归数据集上对模型进行基准测试。我们开发了一种各向异性RBF网络,具有数据驱动的中心放置和基于梯度的宽度优化,一个岭正则化的切比雪夫多项式回归器,以及一个平滑-树混合模型(切比雪夫模型树);三者均以scikit-learn兼容包的形式发布。我们将这些模型与树集成、预训练Transformer及标准基线进行基准比较,评估准确性和泛化行为。Transformer在大多数数据集上准确率排名第一,但其GPU依赖、推理延迟和数据集规模限制制约了其在应用科学和工业领域中常见的基于CPU环境下的部署。在CPU可行的模型中,平滑模型和树集成在准确率上统计上持平,但前者往往表现出更紧致的泛化差距。我们建议常规性地将平滑基模型纳入候选池,尤其是在下游应用受益于更紧致的泛化和渐变的预测输出时。