Addressing class imbalance is a central challenge in credit card fraud detection, as it directly impacts predictive reliability in real-world financial systems. To overcome this, the study proposes an enhanced workflow based on the Explainable Boosting Machine (EBM)-a transparent, state-of-the-art implementation of the GA2M algorithm-optimized through systematic hyperparameter tuning, feature selection, and preprocessing refinement. Rather than relying on conventional sampling techniques that may introduce bias or cause information loss, the optimized EBM achieves an effective balance between accuracy and interpretability, enabling precise detection of fraudulent transactions while providing actionable insights into feature importance and interaction effects. Furthermore, the Taguchi method is employed to optimize both the sequence of data scalers and model hyperparameters, ensuring robust, reproducible, and systematically validated performance improvements. Experimental evaluation on benchmark credit card data yields an ROC-AUC of 0.983, surpassing prior EBM baselines (0.975) and outperforming Logistic Regression, Random Forest, XGBoost, and Decision Tree models. These results highlight the potential of interpretable machine learning and data-driven optimization for advancing trustworthy fraud analytics in financial systems.
翻译:类别不平衡是信用卡欺诈检测中的一个核心挑战,因为它直接影响现实世界金融系统中的预测可靠性。为克服此问题,本研究提出了一种基于可解释提升机(EBM)——GA2M算法的一种透明、先进的实现——的增强工作流程,并通过系统化的超参数调优、特征选择与预处理优化进行改进。该方法不依赖可能引入偏差或导致信息损失的传统采样技术,而是通过优化的EBM在准确性与可解释性之间实现了有效平衡,既能精确检测欺诈交易,又能提供关于特征重要性和交互效应的可操作见解。此外,研究采用田口方法对数据缩放器序列与模型超参数进行协同优化,确保了稳健、可复现且经过系统验证的性能提升。在基准信用卡数据集上的实验评估获得了0.983的ROC-AUC值,超越了先前的EBM基线(0.975),并优于逻辑回归、随机森林、XGBoost和决策树模型。这些结果凸显了可解释机器学习与数据驱动优化在推进金融系统可信欺诈分析方面的潜力。