For regression model selection under the maximum likelihood framework, we study the likelihood ratio confidence region for the regression parameter vector of a full regression model. We show that, when the confidence level increases with the sample size at a certain speed, with probability tending to one, the confidence region contains only vectors representing models having all active variables, including the parameter vector of the true model. This result leads to a consistent model selection criterion with a sparse maximum likelihood interpretation and certain advantages over popular information criteria. It also provides a large-sample characterization of models of maximum likelihood at different model sizes which shows that, for selection consistency, it suffices to consider only this small set of models.
翻译:在最大似然框架下进行回归模型选择时,我们研究全回归模型回归参数向量的似然比置信区域。结果表明,当置信水平以特定速度随样本量增加时,该置信区域以趋近于1的概率仅包含表示包含所有活跃变量的模型向量(包括真实模型的参数向量)。这一结论导出了一致性模型选择准则,该准则具有稀疏最大似然解释,并在某些方面优于常见信息准则。同时,该结果提供了不同模型规模下最大似然模型的大样本特征描述,表明为实现选择一致性,仅需考虑这一小规模模型集合。