There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by fraud detection models need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy, dynamic nature of fraud and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and the model's decision process. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.
翻译:随着欺诈检测领域对可解释、透明且数据驱动模型的需求日益增长,欺诈检测模型做出的决策需要在客户争议时能够被解释。同时,模型的决策过程必须透明,以赢得监管机构和业务利益相关者的信任。此外,由于欺诈行为具有噪声高、动态变化的特点,且存在大量历史数据集,欺诈检测解决方案可以从数据中获益。最后,欺诈检测以其类别不平衡问题著称:合法交易通常比欺诈交易多几个数量级。本文提出深度符号分类(DSC),即深度符号回归框架扩展到分类问题的变体。DSC将分类转化为在由变量、常数和操作构成的词汇表所组成的解析函数空间中的搜索问题,并直接优化任意评估指标。该搜索由通过强化学习训练的深度神经网络引导。由于这些函数为封闭形式且简洁的数学表达式,模型在单次分类决策层面和模型决策过程层面均具有内在可解释性。此外,通过优化对类别不平衡鲁棒的指标(如F1分数),成功解决了类别不平衡问题,从而消除了传统方法中依赖过采样和欠采样技术的弊端。最后,该模型允许在预测准确性和可解释性之间显式平衡。在PaySim数据集上的评估表明,该模型在预测性能上与最先进模型相当,而在可解释性方面超越它们,从而确立了DSC作为欺诈检测系统中具有前景的模型。