Explainable Fraud Detection with Deep Symbolic Classification

from arxiv, 12 pages, 3 figures, To be published in the 3rd International Workshop on Explainable AI in Finance of the 4th ACM International Conference on AI in Finance (ICAIF, https://ai-finance.org/)

There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by fraud detection models need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy, dynamic nature of fraud and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and the model's decision process. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.

翻译：随着欺诈检测领域对可解释、透明且数据驱动模型的需求日益增长，欺诈检测模型做出的决策需要在客户争议时能够被解释。同时，模型的决策过程必须透明，以赢得监管机构和业务利益相关者的信任。此外，由于欺诈行为具有噪声高、动态变化的特点，且存在大量历史数据集，欺诈检测解决方案可以从数据中获益。最后，欺诈检测以其类别不平衡问题著称：合法交易通常比欺诈交易多几个数量级。本文提出深度符号分类（DSC），即深度符号回归框架扩展到分类问题的变体。DSC将分类转化为在由变量、常数和操作构成的词汇表所组成的解析函数空间中的搜索问题，并直接优化任意评估指标。该搜索由通过强化学习训练的深度神经网络引导。由于这些函数为封闭形式且简洁的数学表达式，模型在单次分类决策层面和模型决策过程层面均具有内在可解释性。此外，通过优化对类别不平衡鲁棒的指标（如F1分数），成功解决了类别不平衡问题，从而消除了传统方法中依赖过采样和欠采样技术的弊端。最后，该模型允许在预测准确性和可解释性之间显式平衡。在PaySim数据集上的评估表明，该模型在预测性能上与最先进模型相当，而在可解释性方面超越它们，从而确立了DSC作为欺诈检测系统中具有前景的模型。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日