Stability selection is a widely adopted resampling-based framework for high-dimensional structure estimation and variable selection. However, the concept of 'stability' is often narrowly addressed, primarily through examining selection frequencies, or 'stability paths'. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection framework, moving beyond single-variable analysis. We suggest that the stability estimator offers two advantages: it can serve as a reference to reflect the robustness of the outcomes obtained and help identify an optimal regularization value to improve stability. By determining this value, we aim to calibrate key stability selection parameters, namely, the decision threshold and the expected number of falsely selected variables, within established theoretical bounds. Furthermore, we explore a novel selection criterion based on this regularization value. With the asymptotic distribution of the stability estimator previously established, convergence to true stability is ensured, allowing us to observe stability trends over successive sub-samples. This approach sheds light on the required number of sub-samples addressing a notable gap in prior studies. The 'stabplot' package is developed to facilitate the use of the plots featured in this manuscript, supporting their integration into further statistical analysis and research workflows.
翻译:稳定性选择是一种广泛应用于高维结构估计与变量选择的基于重采样的框架。然而,“稳定性”这一概念通常被狭义地处理,主要通过对选择频率或“稳定性路径”的考察来实现。本文旨在拓宽一种已建立的稳定性估计量的应用,以评估稳定性选择框架的整体稳定性,超越单变量分析。我们认为该稳定性估计量具有两个优势:其一,它可以作为反映所得结果稳健性的参考;其二,它有助于确定一个最优的正则化值以提升稳定性。通过确定该值,我们的目标是在既定的理论界限内校准稳定性选择的关键参数,即决策阈值和错误选择变量的期望数量。此外,我们探索了一种基于此正则化值的新选择准则。由于该稳定性估计量的渐近分布先前已建立,确保了其向真实稳定性的收敛,这使得我们能够观察在连续子样本上的稳定性趋势。这一方法揭示了所需子样本的数量,弥补了先前研究中的一个显著空白。我们开发了“stabplot”软件包,以方便使用本文中展示的图形,支持其进一步整合到统计分析与研究流程中。