Factor importance measures the impact of each feature on output prediction accuracy. Many existing works focus on the model-based importance, but an important feature in one learning algorithm may hold little significance in another model. Hence, a factor importance measure ought to characterize the feature's predictive potential without relying on a specific prediction algorithm. Such algorithm-agnostic importance is termed as intrinsic importance in Williamson et al. (2023), but their estimator again requires model fitting. To bypass the modeling step, we present the equivalence between predictiveness potential and total Sobol' indices from global sensitivity analysis, and introduce a novel consistent estimator that can be directly estimated from noisy data. Integrating with forward selection and backward elimination gives rise to FIRST, Factor Importance Ranking and Selection using Total (Sobol') indices. Extensive simulations are provided to demonstrate the effectiveness of FIRST on regression and binary classification problems, and a clear advantage over the state-of-the-art methods.
翻译:因子重要性衡量了每个特征对输出预测准确性的影响。现有诸多研究聚焦于基于模型的重要性,但一个在学习算法中重要的特征,在另一个模型中可能意义甚微。因此,因子重要性度量应能刻画特征的预测潜力,而不依赖于特定的预测算法。Williamson等人(2023)将这种与算法无关的重要性称为固有重要性,但其估计量仍需进行模型拟合。为绕过建模步骤,我们揭示了预测潜力与全局敏感性分析中的总Sobol'指数之间的等价性,并提出了一种新颖的一致估计量,该估计量可直接从含噪声数据中估计得出。将其与前向选择及后向消除相结合,便提出了FIRST方法——基于总(Sobol')指标的因子重要性排序与选择。大量的仿真实验证明了FIRST在回归和二分类问题上的有效性,并展现出其相较于现有先进方法的明显优势。