This paper shows that the degree of approximate multicollinearity in a linear regression model increases simply by including independent variables, even if these are not highly linearly related. In the current situation where it is relatively easy to find linear models with a large number of independent variables, it is shown that this issue can lead to the erroneous conclusion that there is a worrying problem of approximate multicollinearity. To avoid this situation, an adjusted variance inflation factor is proposed to compensate the presence of a large number of independent variables in the multiple linear regression model. It is shown that this proposal has a direct impact on variable selection models based on influence relationships, which translates into a new decision criterion in the individual significance contrast to be considered in stepwise regression models or even directly in a multiple linear regression model.
翻译:本文表明,线性回归模型中的近似多重共线性程度会随着自变量的引入而增加,即使这些变量之间不存在高度线性关系。在当前相对容易获得包含大量自变量的线性模型的情况下,研究表明这一问题可能导致错误地得出存在严重近似多重共线性问题的结论。为避免这种情况,本文提出一种调整后的方差膨胀因子,以补偿多元线性回归模型中大量自变量的存在。研究表明,该方案对基于影响关系的变量选择模型具有直接影响,这转化为逐步回归模型乃至多元线性回归模型中个体显著性检验所需考虑的新决策准则。