In the field of fraud detection, the availability of comprehensive and privacy-compliant datasets is crucial for advancing machine learning research and developing effective anti-fraud systems. Traditional datasets often focus on transaction-level information, which, while useful, overlooks the broader context of customer behavior patterns that are essential for detecting sophisticated fraud schemes. The scarcity of such data, primarily due to privacy concerns, significantly hampers the development and testing of predictive models that can operate effectively at the customer level. Addressing this gap, our study introduces a benchmark that contains structured datasets specifically designed for customer-level fraud detection. The benchmark not only adheres to strict privacy guidelines to ensure user confidentiality but also provides a rich source of information by encapsulating customer-centric features. We have developed the benchmark that allows for the comprehensive evaluation of various machine learning models, facilitating a deeper understanding of their strengths and weaknesses in predicting fraudulent activities. Through this work, we seek to bridge the existing gap in data availability, offering researchers and practitioners a valuable resource that empowers the development of next-generation fraud detection techniques.
翻译:在欺诈检测领域,获取全面且符合隐私规范的数据集对于推进机器学习研究及开发有效的反欺诈系统至关重要。传统数据集通常聚焦于交易层面的信息,这类数据虽有价值,却忽略了检测复杂欺诈模式所必需的客户行为模式全局背景。受隐私问题制约,此类数据的稀缺严重阻碍了能够在客户层面有效运作的预测模型的开发与测试。为弥补这一空白,本研究引入了一个包含专为客户级欺诈检测设计的结构化数据集的基准。该基准不仅严格遵守隐私准则以确保用户机密性,还通过封装客户中心特征提供了丰富的信息源。我们开发的基准支持对多种机器学习模型进行全面评估,从而加深对其在预测欺诈活动中优势与劣势的理解。通过此项工作,我们致力于弥合数据可用性方面的现有缺口,为研究人员和实践者提供宝贵的资源,以推动下一代欺诈检测技术的发展。