Financial crime costs U.S. institutions over $32 billion each year. Although AI tools for fraud detection have become more advanced, their use in real-world systems still faces a major obstacle: many of these models operate as black boxes that cannot provide the transparent, auditable explanations required by regulations such as OCC Bulletin 2011-12 and Federal Reserve SR 11-7. This study makes three main contributions. First, it offers a thorough evaluation of explanation quality across faithfulness (sufficiency and comprehensiveness at k=5, 10, and 15) and stability (Kendall's W across 30 bootstrap samples). XGBoost paired with TreeExplainer achieves near-perfect stability (W=0.9912), while LSTM with DeepExplainer shows weak results (W=0.4962). Second, the paper introduces the SHAP-Guided Adaptive Ensemble (SGAE), which dynamically adjusts per-transaction ensemble weights based on SHAP attribution agreement, achieving the highest AUC-ROC among all tested models (0.8837 held-out; 0.9245 cross-validation). Third, a complete three-architecture evaluation of LSTM, Transformer, and GNN-GraphSAGE on the full 590,540-transaction IEEE-CIS dataset is provided, with GNN-GraphSAGE achieving AUC-ROC 0.9248 and F1=0.6013. All results are mapped directly to OCC, SR 11-7, and BSA-AML regulatory compliance requirements.
翻译:金融犯罪每年给美国机构造成超过320亿美元的损失。尽管用于欺诈检测的人工智能工具已变得更加先进,但其在现实系统中的应用仍面临重大障碍:许多模型作为黑箱运行,无法提供OCC公告2011-12号及美联储SR 11-7号等法规要求的透明、可审计的解释。本研究做出三项主要贡献。首先,对解释质量进行了全面评估,涵盖忠实度(在k=5、10、15下的充分性和完整性)和稳定性(基于30次自助法样本的Kendall's W)。XGBoost结合TreeExplainer达到了近乎完美的稳定性(W=0.9912),而LSTM结合DeepExplainer表现较弱(W=0.4962)。其次,本文提出了SHAP引导的自适应集成方法(SGAE),该方法基于SHAP归因一致性动态调整每笔交易的集成权重,在所有测试模型中取得了最高的AUC-ROC(保留集0.8837;交叉验证0.9245)。第三,在包含590,540笔交易的完整IEEE-CIS数据集上,对LSTM、Transformer和GNN-GraphSAGE三种架构进行了全面评估,其中GNN-GraphSAGE的AUC-ROC达到0.9248,F1分数为0.6013。所有结果均直接映射至OCC、SR 11-7及BSA-AML合规性监管要求。