This paper motivates and develops a novel and focused approach to variable selection in linear regression models. For estimating the regression mean $μ=\E\,(Y\midd x_0)$, for the covariate vector of a given individual, there is a list of competing estimators, say $\hattμ_S$ for each submodel $S$. Exact expressions are found for the relative mean squared error risks, when compared to the widest model available, say $\mse_S/\mse_\wide$. The theory of confidence distributions is used for accurate assessments of these relative risks. This leads to certain Focused Relative Risk Information Criterion scores, and associated FRIC plots and FRIC tables, as well as to Confidence plots to exhibit the confidence the data give in the submodels. The machinery is extended to handle many focus parameters at the same time, with appropriate averaged FRIC scores. The particular case where all available covariate vectors have equal importance yields a new overall criterion for variable selection, balancing complexity and fit in a natural fashion. A connection to the Mallows criterion is demonstrated, leading also to natural modifications of the latter. The FRIC and AFRIC strategies are illustrated for real data.
翻译:本文提出并发展了一种新颖的聚焦式线性回归模型变量选择方法。针对给定个体协变量向量估计回归均值 $μ=\E\,(Y\midd x_0)$ 时,存在一系列竞争性估计量,即每个子模型 $S$ 对应的 $\hattμ_S$。研究推导了相较于最宽模型(记为 $\mse_S/\mse_\wide$)的相对均方误差风险的精确表达式。利用置信分布理论对这些相对风险进行精确评估,从而构建出聚焦相对风险信息准则评分体系,并衍生出相应的FRIC图、FRIC表以及展示数据对子模型置信度的置信图。该方法可扩展至同时处理多个聚焦参数的情形,通过适当的平均FRIC评分实现多参数综合评估。当所有可用协变量向量具有同等重要性时,该方法可推导出一种新的整体变量选择准则,以自然方式平衡模型复杂度与拟合优度。研究论证了该方法与Mallows准则的关联,并基于此对Mallows准则提出了自然修正。最后通过实际数据案例展示了FRIC与AFRIC策略的应用效果。