When multiple models are considered in regression problems, the model averaging method can be used to weigh and integrate the models. In the present study, we examined how the goodness-of-prediction of the estimator depends on the dimensionality of explanatory variables when using a generalization of the model averaging method in a linear model. We specifically considered the case of high-dimensional explanatory variables, with multiple linear models deployed for subsets of these variables. Consequently, we derived the optimal weights that yield the best predictions. we also observe that the double-descent phenomenon occurs in the model averaging estimator. Furthermore, we obtained theoretical results by adapting methods such as the random forest to linear regression models. Finally, we conducted a practical verification through numerical experiments.
翻译:在回归问题中考虑多个模型时,可采用模型平均方法对模型进行加权整合。本研究考察了在线性模型中使用模型平均方法泛化形式时,估计量的预测优度如何依赖于解释变量的维数。我们重点研究了高维解释变量的情况,针对这些变量的子集部署了多个线性模型。由此推导出能产生最优预测的最佳权重。同时观察到模型平均估计器中存在双重下降现象。进一步地,我们通过将随机森林等方法适配到线性回归模型,获得了理论结果。最后通过数值实验进行了实证验证。