In a context of a continuous digitalisation of processes, organisations must deal with the challenge of detecting anomalies that can reveal suspicious activities upon an increasing volume of data. To pursue this goal, audit engagements are carried out regularly, and internal auditors and purchase specialists are constantly looking for new methods to automate these processes. This work proposes a methodology to prioritise the investigation of the cases detected in two large purchase datasets from real data. The goal is to contribute to the effectiveness of the companies' control efforts and to increase the performance of carrying out such tasks. A comprehensive Exploratory Data Analysis is carried out before using unsupervised Machine Learning techniques addressed to detect anomalies. A univariate approach has been applied through the z-Score index and the DBSCAN algorithm, while a multivariate analysis is implemented with the k-Means and Isolation Forest algorithms, and the Silhouette index, resulting in each method having a transaction candidates' proposal to be reviewed. An ensemble prioritisation of the candidates is provided jointly with a proposal of explicability methods (LIME, Shapley, SHAP) to help the company specialists in their understanding.
翻译:在流程持续数字化的背景下,组织面临着从日益增长的数据量中检测可能揭示可疑活动的异常情况的挑战。为实现这一目标,审计工作定期开展,内部审计师和采购专家不断寻求自动化这些流程的新方法。本研究提出一种方法论,用于对从两个真实大型采购数据集中检测出的案例进行优先级排序。其目标在于提升公司控制工作的有效性,并提高此类任务的执行效率。在应用无监督机器学习技术进行异常检测之前,进行了全面的探索性数据分析。通过z-Score指数和DBSCAN算法实施了单变量分析方法,同时采用k-Means、Isolation Forest算法及Silhouette指数进行了多变量分析,每种方法均生成待审查的交易候选集。研究提供了候选集的集成优先级排序,并结合可解释性方法(LIME、Shapley、SHAP)的提案,以协助企业专家理解检测结果。