Cross Feature Selection to Eliminate Spurious Interactions and Single Feature Dominance Explainable Boosting Machines

Interpretability is a crucial aspect of machine learning models that enables humans to understand and trust the decision-making process of these models. In many real-world applications, the interpretability of models is essential for legal, ethical, and practical reasons. For instance, in the banking domain, interpretability is critical for lenders and borrowers to understand the reasoning behind the acceptance or rejection of loan applications as per fair lending laws. However, achieving interpretability in machine learning models is challenging, especially for complex high-performance models. Hence Explainable Boosting Machines (EBMs) have been gaining popularity due to their interpretable and high-performance nature in various prediction tasks. However, these models can suffer from issues such as spurious interactions with redundant features and single-feature dominance across all interactions, which can affect the interpretability and reliability of the model's predictions. In this paper, we explore novel approaches to address these issues by utilizing alternate Cross-feature selection, ensemble features and model configuration alteration techniques. Our approach involves a multi-step feature selection procedure that selects a set of candidate features, ensemble features and then benchmark the same using the EBM model. We evaluate our method on three benchmark datasets and show that the alternate techniques outperform vanilla EBM methods, while providing better interpretability and feature selection stability, and improving the model's predictive performance. Moreover, we show that our approach can identify meaningful interactions and reduce the dominance of single features in the model's predictions, leading to more reliable and interpretable models. Index Terms- Interpretability, EBM's, ensemble, feature selection.

翻译：可解释性是机器学习模型的一个关键方面，它使人类能够理解并信任这些模型的决策过程。在许多实际应用中，模型的可解释性对于法律、伦理和实际原因至关重要。例如，在银行领域，根据公平借贷法律，可解释性对于贷款人和借款人理解贷款申请被接受或拒绝的原因至关重要。然而，在机器学习模型中实现可解释性具有挑战性，尤其是对于复杂的高性能模型。因此，可解释增强机（EBMs）因其在各种预测任务中具有可解释性和高性能的特性而越来越受欢迎。然而，这些模型可能会遇到一些问题，例如与冗余特征的虚假交互以及所有交互中的单特征主导，这会影响模型预测的可解释性和可靠性。在本文中，我们通过利用交替交叉特征选择、集成特征和模型配置修改技术来探索解决这些问题的创新方法。我们的方法涉及多步特征选择过程，该过程选择一组候选特征、集成特征，然后使用EBM模型对它们进行基准测试。我们在三个基准数据集上评估了我们的方法，结果表明交替技术优于原始EBM方法，同时提供了更好的可解释性和特征选择稳定性，并提高了模型的预测性能。此外，我们表明，我们的方法可以识别有意义的交互，并减少模型中单特征在预测中的主导地位，从而产生更可靠和可解释的模型。索引术语——可解释性，EBMs，集成，特征选择。