Many problems in statistics and machine learning can be formulated as model selection problems, where the goal is to choose an optimal parsimonious model among a set of candidate models. It is typical to conduct model selection by penalizing the objective function via information criteria (IC), as with the pioneering work by Akaike and Schwarz. Via recent work, we propose a generalized IC framework to consistently estimate general loss-based learning problems. In this work, we propose a consistent estimation method for Generalized Linear Model (GLM) regressions by utilizing the recent IC developments. We advance the generalized IC framework by proposing model selection problems, where the model set consists of a potentially uncountable set of models. In addition to theoretical expositions, our proposal introduces a computational procedure for the implementation of our methods in the finite sample setting, which we demonstrate via an extensive simulation study.
翻译:许多统计学与机器学习问题均可转化为模型选择问题,其目标是从候选模型集合中选出最优的简约模型。通常通过信息准则(IC)对目标函数施加惩罚来实现模型选择,这一方法可追溯至Akaike和Schwarz的开创性工作。基于最新研究,我们提出了一种广义信息准则框架,用于一致地估计一般性的基于损失的学习问题。本文利用近期信息准则的进展,提出了一种针对广义线性模型(GLM)回归的一致性估计方法。我们通过提出模型集合可能包含不可数无限多个模型的模型选择问题,进一步推进了广义信息准则框架。除理论阐述外,本文还引入了在有限样本条件下实现所提方法的计算流程,并通过广泛的仿真研究进行了验证。