Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set of explanatory variables. This paper concerns such models under sparsity assumptions for the covariates included in the linear component. Sparse covariates are frequent in regression problems where the task of variable selection is usually of interest. As in other settings, outliers either in the residuals or in the covariates involved in the linear component have a harmful effect. To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, we combine preliminary robust estimators of the additive component, robust linear $MM-$regression estimators with a penalty such as SCAD on the coefficients in the parametric part. Under mild assumptions, consistency results and rates of convergence for the proposed estimators are derived. A Monte Carlo study is carried out to compare, under different models and contamination schemes, the performance of the robust proposal with its classical counterpart. The obtained results show the advantage of using the robust approach. Through the analysis of a real data set, we also illustrate the benefits of the proposed procedure.
翻译:在半参数回归模型中,部分线性可加模型通过引入加性非参数分量和参数分量,为解释响应变量与解释变量集之间的关系提供了有效工具。本文关注在线性分量中协变量满足稀疏性假设时的这类模型。在通常关注变量选择的回归问题中,稀疏协变量频繁出现。与其他情形类似,残差或线性分量中涉及的协变量存在的异常值会产生有害影响。为同时实现模型参数分量的选择和对异常值的鲁棒性,我们结合了加性分量的初步稳健估计量、带惩罚项的稳健线性$MM-$回归估计量(如对参数部分的系数施加SCAD惩罚)。在温和假设下,推导了所提估计量的一致性和收敛速度。通过蒙特卡洛研究,在不同模型和污染方案下比较了稳健方法与经典方法的性能。结果表明采用稳健方法具有优势。通过真实数据集的分析,我们也进一步展示了所提方法的优点。