Stability selection represents an attractive approach to identify sparse sets of features jointly associated with an outcome in high-dimensional contexts. We introduce an automated calibration procedure via maximisation of an in-house stability score and accommodating a priori-known block structure (e.g. multi-OMIC) data. It applies to (LASSO) penalised regression and graphical models. Simulations show our approach outperforms non-stability-based and stability selection approaches using the original calibration. Application of multi-block graphical LASSO on real (epigenetic and transcriptomic) data from the Norwegian Women and Cancer study reveals a central/credible and novel cross-OMIC role of LRRN3 in the biological response to smoking. Proposed approaches were implemented in the R package sharp.
翻译:稳定性选择是一种在高维背景下识别与结果联合相关的稀疏特征集合的有效方法。我们提出了一种基于内部稳定性得分最大化的自动化校准流程,该流程能够适应先验已知的块结构(例如多组学)数据。该方法适用于(LASSO)惩罚回归和图模型。模拟实验表明,我们的方法优于非基于稳定性的方法以及使用原始校准的稳定性选择方法。将多块图LASSO应用于挪威女性与癌症研究的真实(表观遗传和转录组)数据,揭示了LRRN3在吸烟生物反应中具有核心/可信且新颖的跨组学作用。所提出的方法已在R包sharp中实现。