Factor importance measures the impact of each feature on output prediction accuracy. Many existing works focus on the model-based importance, but an important feature in one learning algorithm may hold little significance in another model. Hence, a factor importance measure ought to characterize the feature's predictive potential without relying on a specific prediction algorithm. Such algorithm-agnostic importance is termed as intrinsic importance in Williamson et al. (2023), but their estimator again requires model fitting. To bypass the modeling step, we present the equivalence between predictiveness potential and total Sobol' indices from global sensitivity analysis, and introduce a novel consistent estimator that can be directly estimated from noisy data. Integrating with forward selection and backward elimination gives rise to FIRST, Factor Importance Ranking and Selection using Total (Sobol') indices. Extensive simulations are provided to demonstrate the effectiveness of FIRST on regression and binary classification problems, and a clear advantage over the state-of-the-art methods.
翻译:因子重要性用于衡量每个特征对输出预测准确性的影响。现有研究多聚焦于基于模型的重要性,但在某学习算法中重要的特征可能在另一模型中意义甚微。因此,因子重要性度量应能刻画特征的内在预测潜力,而不依赖于特定预测算法。这种不依赖算法的重要性被Williams等(2023)称为内在重要性,但其估计方法仍需进行模型拟合。为绕过建模步骤,我们揭示了预测潜力与全局敏感性分析中总Sobol'指标之间的等价关系,并提出一种可直接从含噪数据中估计的新型一致估计量。将该方法与前向选择、后向消除相结合,我们提出了FIRST——基于总(Sobol')指标的因子重要性排序与选择方法。大量仿真实验证明,FIRST在回归和二分类问题上具有显著有效性,且相较于现有最先进方法展现出明显优势。