Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In many domains, such as personalized medicine, local interpretations for individual predictions are often required, rather than global scores summarizing overall feature importance. Random Forests (RFs) are widely used in these settings, and existing interpretability methods typically exploit tree structures and split statistics to provide model-specific insights. However, theoretical understanding of local FII methods for RF remains limited, making it unclear how to interpret high importance scores for individual predictions. We propose a novel, local, model-specific FII method that identifies frequent co-occurrences of features along decision paths, combining global patterns with those observed on paths specific to a given test point. We prove that our method consistently recovers the true local signal features and their interactions under a Locally Spike Sparse (LSS) model and also identifies whether large or small feature values drive a prediction. We illustrate the usefulness of our method and theoretical results through simulation studies and a real-world data example.
翻译:特征与交互重要性(Feature and Interaction Importance, FII)方法在监督学习中至关重要,用于评估复杂预测模型中输入变量及其交互作用的相关性。在个性化医疗等许多领域,通常需要针对个体预测的局部解释,而非总结整体特征重要性的全局得分。随机森林在这些场景中被广泛使用,现有的可解释性方法通常利用树结构和分裂统计量提供模型特定的洞察。然而,随机森林局部FII方法的理论理解仍然有限,这使得如何解释个体预测的高重要性得分变得不明确。我们提出了一种新颖的局部、模型特定的FII方法,该方法识别决策路径上特征的频繁共现模式,将全局模式与特定于给定测试点的路径上的模式相结合。我们证明,在局部尖峰稀疏(Locally Spike Sparse, LSS)模型下,该方法能够一致地恢复真实的局部信号特征及其交互作用,并识别驱动预测的是大特征值还是小特征值。通过模拟研究和实际数据示例,我们展示了所提方法及理论结果的有效性。