Generalized additive models (GAM) have been successfully applied to high dimensional data analysis. However, most existing methods cannot simultaneously estimate the link function, the component functions and the variable interaction. To alleviate this problem, we propose a new sparse additive model, named generalized sparse additive model with unknown link function (GSAMUL), in which the component functions are estimated by B-spline basis and the unknown link function is estimated by a multi-layer perceptron (MLP) network. Furthermore, $\ell_{2,1}$-norm regularizer is used for variable selection. The proposed GSAMUL can realize both variable selection and hidden interaction. We integrate this estimation into a bilevel optimization problem, where the data is split into training set and validation set. In theory, we provide the guarantees about the convergence of the approximate procedure. In applications, experimental evaluations on both synthetic and real world data sets consistently validate the effectiveness of the proposed approach.
翻译:广义可加模型(GAM)已成功应用于高维数据分析。然而,现有的大多数方法无法同时估计链接函数、分量函数以及变量交互作用。为了缓解这一问题,我们提出了一种新的稀疏可加模型,称为具有未知链接函数的广义稀疏可加模型(GSAMUL),其中分量函数通过B样条基进行估计,未知链接函数则通过多层感知机(MLP)网络进行估计。此外,采用$\ell_{2,1}$-范数正则化器进行变量选择。所提出的GSAMUL能够同时实现变量选择和隐藏交互作用的识别。我们将该估计过程整合为一个双层优化问题,其中数据被划分为训练集和验证集。在理论上,我们为近似求解过程的收敛性提供了保证。在应用方面,对合成数据集和真实世界数据集的实验评估一致验证了所提方法的有效性。