Linear regression models have been extensively considered in the literature. However, in some practical applications they may not be appropriate all over the range of the covariate. In this paper, a more flexible model is introduced by considering a regression model $Y=r(X)+\varepsilon$ where the regression function $r(\cdot)$ is assumed to be linear for large values in the domain of the predictor variable $X$. More precisely, we assume that $r(x)=\alpha_0+\beta_0 x$ for $x> u_0$, where the value $u_0$ is identified as the smallest value satisfying such a property. A penalized procedure is introduced to estimate the threshold $u_0$. The considered proposal focusses on a semiparametric approach since no parametric model is assumed for the regression function for values smaller than $u_0$. Consistency properties of both the threshold estimator and the estimators of $(\alpha_0,\beta_0)$ are derived, under mild assumptions. Through a numerical study, the small sample properties of the proposed procedure and the importance of introducing a penalization are investigated. The analysis of a real data set allows us to demonstrate the usefulness of the penalized estimators.
翻译:线性回归模型在文献中已被广泛研究。然而,在某些实际应用中,这些模型可能并不适用于协变量的全部取值范围。本文通过考虑回归模型 $Y=r(X)+\varepsilon$ 引入了一个更灵活的模型,其中回归函数 $r(\cdot)$ 被假定为在预测变量 $X$ 的定义域内大值区域呈线性关系。更精确地说,我们假定当 $x> u_0$ 时 $r(x)=\alpha_0+\beta_0 x$,其中值 $u_0$ 被定义为满足该性质的最小值。我们引入了一种惩罚程序来估计阈值 $u_0$。由于对于小于 $u_0$ 的回归函数未假定任何参数模型,所提出的方案侧重于半参数方法。在温和假设下,推导了阈值估计量以及 $(\alpha_0,\beta_0)$ 估计量的一致性性质。通过数值研究,我们探讨了所提程序的小样本性质以及引入惩罚的重要性。对真实数据集的分析使我们能够展示惩罚估计量的实用性。