Estimating a prediction function is a fundamental component of many data analyses. The Super Learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a Super Learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen, similar to the guidance for choosing a library of prediction algorithms for the Super Learner.
翻译:预测函数估计是众多数据分析中的基本组成部分。Super Learner集成作为堆叠法的一种特定实现,具有理想的理论性质,并已在诸多应用中获得成功。通过在集成内部对预测算法拟合前使用变量筛选算法(包括lasso)可完成降维。然而,在已知lasso表现欠佳的场景下,采用lasso降维的Super Learner的性能尚未得到充分探索。我们提供的实证结果表明:应采用多样化的候选筛选算法组合,以防止单一筛选方法表现不佳,这与Super Learner预测算法库的选择指导原则一致。