Compared to "black-box" models, like random forests and deep neural networks, explainable boosting machines (EBMs) are considered "glass-box" models that can be competitively accurate while also maintaining a higher degree of transparency and explainability. However, EBMs become readily less transparent and harder to interpret in high-dimensional settings with many predictor variables; they also become more difficult to use in production due to increases in scoring time. We propose a simple solution based on the least absolute shrinkage and selection operator (LASSO) that can help introduce sparsity by reweighting the individual model terms and removing the less relevant ones, thereby allowing these models to maintain their transparency and relatively fast scoring times in higher-dimensional settings. In short, post-processing a fitted EBM with many (i.e., possibly hundreds or thousands) of terms using the LASSO can help reduce the model's complexity and drastically improve scoring time. We illustrate the basic idea using two real-world examples with code.
翻译:相比于随机森林和深度神经网络等"黑箱"模型,可解释助推机器(EBM)被视为"玻璃箱"模型,它们在保持较高透明度和可解释性的同时,也能达到有竞争力的精度。然而,在处理包含众多预测变量的高维设定时,EBM的透明度会显著降低,可解释性也变得更加困难;同时,由于评分时间的增加,它们在生产环境中的使用也变得更具挑战性。我们提出了一种基于最小绝对收缩与选择算子(LASSO)的简洁解决方案,该方案通过对单个模型项进行重新加权并移除相关性较低的项来引入稀疏性,从而使这些模型能够在高维设定中保持透明性和相对快速的评分时间。简而言之,使用LASSO对包含大量(可能数百或数千个)项已拟合的EBM进行后处理,有助于降低模型复杂度并大幅提升评分时间。我们通过两个包含代码的实际案例阐述了这一基本思想。