Feature selection remains a major challenge in medical prediction, where existing approaches such as LASSO often lack robustness and interpretability. We introduce GRASP, a novel framework that couples Shapley value driven attribution with group $L_{21}$ regularization to extract compact and non-redundant feature sets. GRASP first distills group level importance scores from a pretrained tree model via SHAP, then enforces structured sparsity through group $L_{21}$ regularized logistic regression, yielding stable and interpretable selections. Extensive comparisons with LASSO, SHAP, and deep learning based methods show that GRASP consistently delivers comparable or superior predictive accuracy, while identifying fewer, less redundant, and more stable features.
翻译:特征选择在医学预测中仍是一个重大挑战,现有方法如LASSO通常缺乏鲁棒性和可解释性。我们提出了GRASP这一新颖框架,它将Shapley值驱动的归因与组$L_{21}$正则化相结合,以提取紧凑且非冗余的特征集。GRASP首先通过SHAP从预训练的树模型中蒸馏出组级别的重要性分数,然后通过组$L_{21}$正则化逻辑回归强制执行结构化稀疏性,从而产生稳定且可解释的特征选择结果。与LASSO、SHAP及基于深度学习的方法进行的广泛比较表明,GRASP始终能提供相当或更优的预测准确性,同时识别出更少、冗余度更低且更稳定的特征。