Validation-Stage Combinatorial Fusion Analysis for Imbalanced Credit-Card Fraud Detection

Credit-card fraud detection is difficult because fraudulent transactions are rare, costly, and unevenly distributed. Strong gradient-boosted tree models already perform well on structured transaction data, so the value of another fusion method is not obvious. This paper examines whether Combinatorial Fusion Analysis (CFA), which searches over model subsets and rank-score fusion rules, can still add value on the IEEE-CIS Fraud Detection benchmark. Using a leakage-free 60/20/20 train/validation/test protocol, we evaluate 480 fusion configurations built from seven base classifiers. The best test-set result comes from diversity-weighted score fusion of Random Forest, XGBoost, and LightGBM (DEF WtScore), with AUC-ROC = 0.9405, AUPRC = 0.6699, and F1 = 0.6373. Bootstrap confidence intervals from 1,000 resamples show that the gains over the strongest single model exclude zero for all three metrics. CFA matches soft voting on AUC-ROC, improves AUPRC and F1, and outperforms stacking in this setting. A CTGAN augmentation experiment gives a negative result: synthetic fraud samples degrade both individual models and CFA. Overall, CFA is most useful here not as a way to combine every classifier, but as a validation-stage method for choosing a small, complementary subset and assigning diversity-aware weights.

翻译：信用卡欺诈检测因欺诈交易稀少、成本高昂且分布不均而颇具挑战。强梯度提升树模型已能在结构化交易数据上取得良好表现，因此另一种融合方法的价值并不明显。本文探讨了组合融合分析（CFA，该方法对模型子集和排名得分融合规则进行搜索）在IEEE-CIS欺诈检测基准测试中是否能额外提升性能。采用无泄漏的60/20/20训练/验证/测试方案，我们基于七个基分类器评估了480种融合配置。最佳测试集结果来自随机森林、XGBoost和LightGBM的多样性加权得分融合（DEF WtScore），其AUC-ROC为0.9405，AUPRC为0.6699，F1为0.6373。基于1,000次重抽样的自举置信区间表明，相较于最强单一模型，三项指标增益的置信区间均不包含零。CFA在AUC-ROC上匹配软投票，在AUPRC和F1上有所提升，并在该设定下优于堆叠方法。CTGAN增强实验给出负面结果：合成欺诈样本同时降低了单个模型和CFA的性能。总体而言，CFA在此场景下最有用的方式并非合并所有分类器，而是作为验证阶段方法，用于选择互补小模型子集并分配多样性感知权重。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

DGP双粒度提示框架：图增强大模型助力欺诈检测

专知会员服务

9+阅读 · 2025年8月17日

基于深度学习的伪装目标检测研究进展

专知会员服务

32+阅读 · 2025年4月12日

图神经网络在金融欺诈检测中的应用综述

专知会员服务

28+阅读 · 2024年11月22日

【KDD2024】SEFraud：通过解释性掩码学习实现的基于图的自解释欺诈检测

专知会员服务

18+阅读 · 2024年6月18日