Estimating a prediction function is a fundamental component of many data analyses. The super learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms (screeners), including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a super learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screeners should be used to protect against poor performance of any one screener, similar to the guidance for choosing a library of prediction algorithms for the super learner. These results are further illustrated through the analysis of HIV-1 antibody data.
翻译:预测函数的估计是许多数据分析的基本组成部分。超学习器集成作为一种特定的堆叠实现,具有良好的理论特性,并已在众多应用中成功使用。通过在集成中先使用变量筛选算法(筛选器,包括lasso)进行降维,再拟合其他预测算法,可以实现维度缩减。然而,在已知lasso表现不佳的情况下,使用lasso进行降维的超学习器性能尚未得到充分探究。我们提供的实证结果表明,应采用多样化的候选筛选器集合,以防止单一筛选器的性能不佳,这与为超学习器选择预测算法库的指导原则相似。这些结果通过HIV-1抗体数据的分析得到进一步阐释。