For regression model selection via maximum likelihood estimation, we adopt a vector representation of candidate models and study the likelihood ratio confidence region for the regression parameter vector of a full model. We show that when its confidence level increases with the sample size at a certain speed, with probability tending to one, the confidence region consists of vectors representing models containing all active variables, including the true parameter vector of the full model. Using this result, we examine the asymptotic composition of models of maximum likelihood and find the subset of such models that contain all active variables. We then devise a consistent model selection criterion which has a sparse maximum likelihood estimation interpretation and certain advantages over popular information criteria.
翻译:针对基于最大似然估计的回归模型选择问题,我们采用候选模型的向量表示法,并研究了全模型回归参数向量的似然比置信区域。研究表明,当置信水平以特定速度随样本量增加时,该置信区域在概率趋于1的情况下包含表征所有活跃变量的模型向量,其中包含全模型的真实参数向量。基于这一结果,我们分析了最大似然模型的渐近组成结构,找出了包含所有活跃变量的模型子集。进而提出了一种具有稀疏最大似然估计解释的相合模型选择准则,该准则相较于主流信息准则具有特定优势。