Deep forest is a non-differentiable deep model which has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explanatory tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real world data verify the effectiveness of our methods.
翻译:深度森林是一种不可微的深度模型,在各类应用中(尤其是类别型/符号型或混合建模任务)取得了显著的实证成功。许多应用领域倾向于采用可解释模型,例如通过特征贡献对每个预测提供局部解释的随机森林,以及通过平均不纯度减少(MDI)提供全局特征重要性的方法。然而,作为随机森林的级联结构,深度森林仅在第一层具有可解释性。从第二层开始,大量树的分裂依赖于前一层生成的新特征,这使得现有的随机森林解释工具不再适用。为了揭示原始特征在深层中的影响,我们设计了一种计算方法,为每一层引入估计步骤和校准步骤,进而提出了适用于深度森林的特征贡献与MDI特征重要性计算工具。在模拟数据和真实数据上的实验结果验证了本方法的有效性。