Guiding adaptive shrinkage by co-data to improve regression-based prediction and feature selection

The high dimensional nature of genomics data complicates feature selection, in particular in low sample size studies - not uncommon in clinical prediction settings. It is widely recognized that complementary data on the features, `co-data', may improve results. Examples are prior feature groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories. Yet, the uptake of learning methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating feature selection. Finally, we demonstrate the versatility of the guided shrinkage methodology by showing how to `do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving feature selection in genetics studies.

翻译：基因组数据的高维特性使特征选择复杂化，尤其在样本量较小的临床预测研究中更为突出。业界普遍认识到，特征的补充数据（即“副数据”）可改善分析结果，例如先验特征分组或相关研究的p值。由于公共数据库的可得性，此类副数据在基因组学场景中普遍存在，但结构化利用副数据的学习方法仍应用有限。本文综述了引导式自适应收缩方法：一类基于回归的学习器，通过副数据调整对学习器性能至关重要的收缩参数。我们探讨了技术细节，同时分析了该方法在可处理副数据类型方面的适用性。将该类方法与若干其他方法进行了对比研究，特别通过评估特征选择效果，将组自适应收缩与更为知名的稀疏组套索进行比较。最后，我们通过展示“自主实施”流程来证明引导式收缩方法的通用性：整合副数据学习器与尖刺-板状先验的实现，以改进遗传学研究中的特征选择。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日