Sparse regression and classification estimators that respect group structures have application to an assortment of statistical and machine learning problems, from multitask learning to sparse additive modeling to hierarchical selection. This work introduces structured sparse estimators that combine group subset selection with shrinkage. To accommodate sophisticated structures, our estimators allow for arbitrary overlap between groups. We develop an optimization framework for fitting the nonconvex regularization surface and present finite-sample error bounds for estimation of the regression function. As an application requiring structure, we study sparse semiparametric additive modeling, a procedure that allows the effect of each predictor to be zero, linear, or nonlinear. For this task, the new estimators improve across several metrics on synthetic data compared to alternatives. Finally, we demonstrate their efficacy in modeling supermarket foot traffic and economic recessions using many predictors. These demonstrations suggest sparse semiparametric additive models, fit using the new estimators, are an excellent compromise between fully linear and fully nonparametric alternatives. All of our algorithms are made available in the scalable implementation grpsel.
翻译:尊重群组结构的稀疏回归与分类估计器可应用于各类统计与机器学习问题,包括多任务学习、稀疏可加建模及层次选择。本文提出结合群组子集选择与收缩的结构化稀疏估计器。为适应复杂结构,我们的估计器允许群组间任意重叠。我们开发了一个优化框架以拟合非凸正则化曲面,并给出了回归函数估计的有限样本误差界。作为需要结构化的应用,我们研究稀疏半参数可加建模——该过程允许每个预测变量的效应为零、线性或非线性。在此任务中,新估计器在多个指标上相较于替代方法提升了合成数据的表现。最后,我们通过使用多个预测变量对超市客流量与经济衰退进行建模,展示了其有效性。这些实践表明,采用新估计器拟合的稀疏半参数可加模型,是全线性与非参数替代方案之间的极佳折衷。所有算法均在可扩展实现grpsel中公开提供。