Linear regression models have been extensively considered in the literature. However, in some practical applications they may not be appropriate all over the range of the covariate. In this paper, a more flexible model is introduced by considering a regression model $Y=r(X)+\varepsilon$ where the regression function $r(\cdot)$ is assumed to be linear for large values in the domain of the predictor variable $X$. More precisely, we assume that $r(x)=\alpha_0+\beta_0 x$ for $x> u_0$, where the value $u_0$ is identified as the smallest value satisfying such a property. A penalized procedure is introduced to estimate the threshold $u_0$. The considered proposal focusses on a semiparametric approach since no parametric model is assumed for the regression function for values smaller than $u_0$. Consistency properties of both the threshold estimator and the estimators of $(\alpha_0,\beta_0)$ are derived, under mild assumptions. Through a numerical study, the small sample properties of the proposed procedure and the importance of introducing a penalization are investigated. The analysis of a real data set allows us to demonstrate the usefulness of the penalized estimators.
翻译:线性回归模型在文献中已被广泛研究。然而,在某些实际应用中,它们可能并不适用于协变量的整个范围。本文通过考虑回归模型$Y=r(X)+\varepsilon$引入了一种更灵活的模型,其中回归函数$r(\cdot)$被假定为在预测变量$X$的定义域内取较大值时呈线性。更精确地,我们假设当$x> u_0$时,有$r(x)=\alpha_0+\beta_0 x$,其中$u_0$被定义为满足该性质的最小值。本文提出了一种带惩罚的程序来估计阈值$u_0$。所提出的方案侧重于半参数方法,因为对于小于$u_0$的值,回归函数未假设任何参数模型。在温和假设下,我们推导了阈值估计量及$(\alpha_0,\beta_0)$估计量的一致性性质。通过数值研究,我们探讨了所提程序的小样本性质以及引入惩罚的重要性。对真实数据集的实例分析展示了惩罚估计量的实用性。