An important issue in many multivariate regression problems is eliminating candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics have the potential to meet this challenge. In this paper, the strong consistency and the central limit theorem of the KOO statistics are derived under the LD setting and mild distributional assumptions (finite fourth moments) of the errors. These theoretical results lead us to propose a subset selection rule based on the KOO statistics with the bootstrap threshold. Simulation results support our conclusions and demonstrate the selection probabilities by the KOO approach with the bootstrap threshold outperform the methods using Akaike information threshold, Bayesian information threshold and Mallow's C$_p$ threshold. We compare the proposed KOO approach with those based on information threshold to a chemometrics dataset and a yeast cell-cycle dataset, which suggests our proposed method identifies useful models.
翻译:在许多多元回归问题中,一个重要环节是剔除预测向量为零的候选预测变量。在响应变量与预测变量数量均较大的大维(LD)设定下,模型选择面临可扩展性挑战。留一法(KOO)统计量具备应对此挑战的潜力。本文在大维设定及误差项满足温和分布假设(四阶矩有限)的条件下,推导了KOO统计量的强一致性与中心极限定理。基于这些理论结果,我们提出了一种以KOO统计量结合Bootstrap阈值的子集选择准则。模拟结果支持我们的结论,并表明基于KOO方法与Bootstrap阈值的选择概率优于使用Akaike信息阈值、Bayesian信息阈值及Mallow’s C$_p$阈值的方法。我们将所提出的KOO方法与基于信息阈值的其他方法应用于化学计量学数据集和酵母细胞周期数据集,结果表明所提方法能够识别出有效模型。