A new method called the aggregated sure independence screening is proposed for the computational challenges in variable selection of interactions when the number of explanatory variables is much higher than the number of observations (i.e., $p\gg n$). In this problem, the two main challenges are the strong hierarchical restriction and the number of candidates for the main effects and interactions. If $n$ is a few hundred and $p$ is ten thousand, then the memory needed for the augmented matrix of the full model is more than $100{\rm GB}$ in size, beyond the memory capacity of a personal computer. This issue can be solved by our proposed method but not by our competitors. Two advantages are that the proposed method can include important interactions even if the related main effects are weak or absent, and it can be combined with an arbitrary variable selection method for interactions. The research addresses the main concern for variable selection of interactions because it makes previous methods applicable to the case when $p$ is extremely large.
翻译:本文提出了一种名为聚合可靠独立筛选的新方法,以解决解释变量数量远大于观测样本数量(即$p\gg n$)时交互作用变量选择中的计算挑战。该问题的两大核心挑战在于强层次结构约束以及主效应与交互作用候选变量的数量。当$n$为数百而$p$达到万级时,完整模型的增广矩阵所需内存将超过100GB,超出个人计算机的内存容量。此问题可通过我们提出的方法解决,而现有竞争方法则无法应对。本方法具有两大优势:即使相关主效应较弱或不存在时仍能包含重要交互作用,且可与任意交互作用变量选择方法结合使用。该研究解决了交互作用变量选择的核心关切,使得现有方法能够适用于$p$极大的情况。