In credit scoring, machine learning models are known to outperform standard parametric models. As they condition access to credit, banking supervisors and internal model validation teams need to monitor their predictive performance and to identify the features with the highest impact on performance. To facilitate this, we introduce the XPER methodology to decompose a performance metric (e.g., AUC, $R^2$) into specific contributions associated with the various features of a classification or regression model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Furthermore, we find that the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). We also show how XPER can be used to deal with heterogeneity issues and significantly boost out-of-sample performance.
翻译:在信用评分中,机器学习模型已知优于标准参数模型。由于这些模型影响信贷的获取,银行监管机构和内部模型验证团队需要监控其预测性能,并识别对性能影响最大的特征。为此,我们引入XPER方法论,将性能指标(如AUC、$R^2$)分解为分类或回归模型中各个特征的具体贡献。XPER在理论上基于Shapley值,且既是模型无关的,也是性能指标无关的。此外,它可以在模型层面或个体层面实施。利用一个新的汽车贷款数据集,我们将训练用于预测贷款申请人违约概率的机器学习模型的AUC进行分解。结果表明,少数特征可以解释模型性能中出人意料的大部分。进一步地,我们发现对模型预测性能贡献最大的特征可能并非对个体预测(SHAP)贡献最大的特征。我们还展示了XPER如何用于处理异质性问题并显著提升样本外性能。