Segmented regression models offer model flexibility and interpretability as compared to the global parametric and the nonparametric models, and yet are challenging in both estimation and inference. We consider a four-regime segmented model for temporally dependent data with segmenting boundaries depending on multivariate covariates with non-diminishing boundary effects. A mixed integer quadratic programming algorithm is formulated to facilitate the least square estimation of the regression and the boundary parameters. The rates of convergence and the asymptotic distributions of the least square estimators are obtained for the regression and the boundary coefficients, respectively. We propose a smoothed regression bootstrap to facilitate inference on the parameters and a model selection procedure to select the most suitable model within the model class with at most four segments. Numerical simulations and a case study on air pollution in Beijing are conducted to demonstrate the proposed approach, which shows that the segmented models with three or four regimes are suitable for the modeling of the meteorological effects on the PM2.5 concentration.
翻译:相较于全局参数模型与非参数模型,分段回归模型在保持模型灵活性与可解释性的同时,在估计与推断方面均面临挑战。本文研究一种适用于时间相依数据的四段式分段模型,其分段边界依赖于具有非递减边界效应的多元协变量。我们构建了一种混合整数二次规划算法,以促进回归参数与边界参数的最小二乘估计。分别推导了回归系数与边界系数的最小二乘估计量的收敛速率及渐近分布。我们提出了一种平滑回归自助法以辅助参数推断,并设计了一种模型选择程序,用于在最多包含四个分段数的模型类中选取最合适的模型。通过数值模拟与一项关于北京空气污染的案例研究验证了所提方法,结果表明具有三段或四段的分段模型适用于建模气象因素对PM2.5浓度的影响。