Integrating Complex Covariate Transformations in Generalized Additive Models

Transformations of covariates are widely used in applied statistics to improve interpretability and to satisfy assumptions required for valid inference. More broadly, feature engineering encompasses a wider set of practices aimed at enhancing predictive performance, and is typically performed as part of a data pre-processing step. In contrast, this paper integrates a substantial component of the feature engineering process directly into the modelling stage. This is achieved by introducing a novel general framework for embedding interpretable covariate transformations within multi-parameter Generalised Additive Models (GAMs). Our framework accommodates any sufficiently differentiable scalar-valued transformation of potentially high-dimensional and complex covariates. These transformations are treated as integral model components, with their parameters estimated jointly with regression coefficients via maximum a posteriori (MAP) methods, and joint uncertainty quantified via approximate Bayesian techniques. Smoothing parameters are selected in an empirical Bayes framework using a Laplace approximation to the marginal likelihood, supported by efficient computation based on implicit differentiation methods. We demonstrate the flexibility and practical value of the proposed methodology through applications to forecasting electricity net-demand in Great Britain and to modelling house prices in London. Methods for building and fitting GAMs with nested transformations are provided by the gamFactory R package, available at https://github.com/mfasiolo/gamFactory, while the code for reproducing the results in this paper is available at https://doi.org/10.5281/zenodo.19239350.

翻译：协变量的变换在应用统计学中被广泛使用，旨在提高模型可解释性并满足有效推断所需的假设条件。更广泛而言，特征工程涵盖了一系列旨在提升预测性能的实践方法，通常作为数据预处理步骤的一部分。与此相反，本文直接将特征工程流程的重要环节整合到建模阶段。为此，我们引入了一个新颖的通用框架，用于在多元参数广义加性模型（GAMs）中嵌入可解释的协变量变换。该框架能够容纳任意充分可微的标量值变换——即便原始协变量具有高维度和复杂结构。这些变换被视为模型整体组成部分，其参数与回归系数通过最大后验（MAP）方法联合估计，并利用近似贝叶斯技术量化联合不确定性。平滑参数通过经验贝叶斯框架选择，采用边缘似然的拉普拉斯近似，并辅以基于隐式微分方法的高效计算。我们通过预测英国电力净需求与建模伦敦房价的实际应用，展示了所提出方法的灵活性与实践价值。用于构建和拟合嵌套变换GAMs的方法已封装在gamFactory R包中（下载链接：https://github.com/mfasiolo/gamFactory），复现论文结果的代码参见：https://doi.org/10.5281/zenodo.19239350。