Generalized additive models (GAM) have been successfully applied to high dimensional data analysis. However, most existing methods cannot simultaneously estimate the link function, the component functions and the variable interaction. To alleviate this problem, we propose a new sparse additive model, named generalized sparse additive model with unknown link function (GSAMUL), in which the component functions are estimated by B-spline basis and the unknown link function is estimated by a multi-layer perceptron (MLP) network. Furthermore, $\ell_{2,1}$-norm regularizer is used for variable selection. The proposed GSAMUL can realize both variable selection and hidden interaction. We integrate this estimation into a bilevel optimization problem, where the data is split into training set and validation set. In theory, we provide the guarantees about the convergence of the approximate procedure. In applications, experimental evaluations on both synthetic and real world data sets consistently validate the effectiveness of the proposed approach.
翻译:广义可加模型(GAM)已成功应用于高维数据分析。然而,现有方法大多无法同时估计链接函数、分量函数及变量交互作用。为缓解此问题,本文提出一种新的稀疏可加模型,称为具有未知链接函数的广义稀疏可加模型(GSAMUL)。该模型采用B样条基估计分量函数,并通过多层感知机(MLP)网络估计未知链接函数。此外,引入$\ell_{2,1}$-范数正则化器进行变量选择。所提出的GSAMUL能够同时实现变量选择与隐式交互作用识别。我们将该估计问题构建为双层优化问题,并将数据划分为训练集与验证集。在理论上,我们给出了近似求解过程的收敛性保证。在应用层面,基于合成数据集与真实数据集的实验评估一致验证了所提方法的有效性。