The information criterion for determining the number of explanatory variables in a subset regression modeling is discussed. Information criterion such as AIC is effective and frequently used in model selection for ordinary regression models and statistical models. With the recent prosperity of data science, analysis of large-scale data has become important. When constructing models heuristically from a very large number of candidate explanatory variables, there is a possibility of picking up apparent correlations and adopting inappropriate variables. In this paper, we point out the problems specific to subset regression from the viewpoint of bias correction for log-likelihood and present a correction method that takes this into account.
翻译:讨论了在子集回归建模中确定解释变量数量的信息准则。诸如AIC等信息准则在普通回归模型和统计模型的模型选择中有效且常用。随着数据科学的蓬勃发展,大规模数据分析变得日益重要。当从大量候选解释变量中启发式地构建模型时,存在拾取表面相关性并采纳不恰当变量的可能性。本文从对数似然偏差校正的角度,指出了子集回归特有的问题,并提出了一种考虑这一问题的校正方法。